DOCUMENT RESUME 

ED 071 059 CS 000 326 



AUTHOR 
TITLE 

INSTITUTION 
PUB DATE 
NOTE 

AVAILABLE FROM 



Blanton, Williairir Ed* ; And Others 

Reading Tests for the Secondary Grades: A Review and 

Evaluation. Reading Aid Series. 

International Reading Association, Newark. Del. 

72 

6 Op. 

International Reading Association, 6 Tyre Avenue, 
Newark, Del. 19711 ($2.00 non-member, $1.75 
member) 



EDRS PRICE 
DESCRIPTORS 



MF-$0.65 HC-$3.29 

Achievement Tests; Developmental Reading; Informal 
Reading Inventory; Reading Instruction; ♦Reading 
Tests; ♦Secondary Education; Standardized Tests; 
♦Test Reviews; ♦Test Selection 



ABSTRACT 

This booklet presents ideas for utilizing 
standardized reading tests and an informal inventory to determine the 
reading levels of secondary school students. Comprehensive reviews of 
the most commonly used standardized tests for high school students 
and information about their construction, standardization, 
administration, and use make up the btilk of the text. The rest of the 
ibook deals with the criteria used in reviewing the tests and a short 
chapter detailing how to select a reading achievement test. The book 
is intended primarily for classroom teachers and other personnel 
directly concerned with selecting reading achievement tests; 
information is thus not presented in highly technical or statistical 
terms. (TO) 



^ Reading AiJs Series 

^ Charles T. Mangrum, Series Editor 



University of Miami 



u s DEPARTMENT OF HEALTH. 
E^ VCATION ft WELFARE 
OFFICE OF EDUCATION 
THIS OOCUMENT HAS SEEN REPRO 
OUCEO EXACTLY AS BECEIVEO FROM 
THE PERSON OR ORGANIZATION ORIG 
INATING IT POINTS OF VIEW OR OPlN 
IONS STATEO 00 NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EOU 
CATION POSITION OR POLICY 



Reading Tests 

for the Secondary Grades: 

A Review and Evaluation 



William Blanton, Roger Farr, and J. Jaap Tuinman 
Editors 

Indiana University 




An IRA Service Bulletiii 



Published by the 

INTERNATIONAL READING ASSOCIATION • Newark, Delaware 



INTERNATIONAL READING ASSOCIATION 

OFFICERS 
197M972 

President Tluodore L. Harris, University ofPugcl Sound, 
Tacoma, Washington 

President- Elect William K. Durr, Micliigan State University, 
tast Lansing, Michigan 

Past President Donald L. Cleland, University of Pittsburgh, 
Pittsburgh, Pennsylvania 



DIRECTORS 

Term Expiring Spring 1972 

Thomas C. Barrett, University of Wisconsin, Madison. Wisconsin 
Constance M. McCullougli, San Francisco State College, 

San Francisco, California 
Eileen E. Sargent, Nicolet Union Higli School, Milwaukee. Wisconsin 

Term Expiring Spring 1973 

Marjorle S. Jolinson, Temple University, Philadelphia, Pennsylvania 
Robert Karlin, Queens College, City University of Kew York, 

Flushing, New York 
Olive S. Niles, State Department of Education, Hartford, Connecticut 

Term Expiring Spring 1974 

William Eller, State University of New York, Buffalo, New York 
William J. Iverson, Stanford University, St jnford, California 
Eunice Sliaed Newton. Howard University, Washington, D C. 

Executive Secretary-Treasurcr Ralph C. Steiger, University of Delaware 
Newirk, Delaware 

Assistant Executive Secretary Ronald W Mitchell, International Reading 
Association, Newark, Delaware 

Publications Coordinator Faye R. Branca, International Reading Association, 
Newark, Delaware 



Copyright 1972 by the 

International Reading Association, Inc. 

Library of Congress Catalog Card Number 70-190453 



PERMSSSION TO REPRODUCE IMIS COPY 
RICmTCD material mas 8EEN GRANTED 



International 



Reading Association 



ii 



»0 ERrC AND ORGANIZATIONS 0P» RATING 
UNDER AGREfVCNTS WITH THE US OfflCt 
OF EDUCATION FURTHER HEPRODUCTION 
OUTSIDE THE £RiC system REQUIRES r|R 
MlS'^ION OF THE COPVRIOHT OWNER 



CONTENTS 



Foreword iv 

I Criteria for Reviewing Tests 

S Selecting a Reading Achievement Test 

9 Test Reviews 

Calirornia Acliievenient Tests: Reading ^ 
Cooperative English Tests - Reading Section 1 2 
Davis Recjding Test 1 5 
Diagnostic Reading Tests 1 9 
Gates-MacGinitie Reading Tests 23 
Iowa Silent Reading Tests 26 

Metropolitan Achievement Tests: Reading, Advanced Level 28 

Nelson-Denny Reading Test 3 1 

Nelson Reading Test 34 

Seqnential Tests of Educational Progress 38 

SRA Achievement Series (Multilevel Edition) 40 

Stanford Achievement Test High School Reading 44 

Traxler High School Reading Test - Revised 47 

Traxler Silent Reading Test 50 

53 Appendix 



lliis Reading Aid was developed under the auspices of 


the members of ihe IRA Ivvaluation of Tests Committee: 


Wiiham Blanton 


Lawrence M. Kasdon 


I'rederick B. Davis 


Carolyn V.. Massad 


Roger l-arr 


Nancy Roser 


Walter Hill 


Robert Sehremer 


Marjorie Seddon Johnson 


J. Jaap Tiiinman 



iii 



FOREWORD 



The need for classes in developmental reding :il the secondarv school level 
was recognized before World War If and a few sch-H-K pioneered n 
offering such courses. In the decades that followed, seoondarv schools 
increasingly saw the need and piovided some students, at "easi'with the 
opportunity to acquire readmg skills commensurate with their levels of 
ability, achievement, and purposes. A lew schools instituted readii." 
courses for all their students, and some forward lookiny stato department's 
o education required a course in the teaching of^eadiim for initial 
certification of all secondary school teachers. 

if" the secondary school accepts the obligation of teachiim all students 
to read at the level of their abilities, then the materials and method^ 
d S .='PP^"P"=">-' ">«-• 'evels of the learners. Mow u 

deuri ine these evels is a continual problem, for the longer the students 
stay „, school, the greater the rarge of achievement becomes within a 

s^is::!:;:;^::::^:'' '""^ '''-'^^ ^p-^^- 

The Reading Aids Series -vas designed to provide practical suus-estions 
a. this one presents ideas I-., utili/ing standardized tests alono with an 
n ormal inventory to determine the reading levels of secondaK' school 
''"h , 'r" t-omprehensive reviews of the most commonlv 

aTout tSrtI '"S."' •^•"'"^"•^ Sives informat on 

about their construction, standardization, administration, and use Rir- 
hermore. the booklet serves as a kind of sequel to another IRA publica- 
non. Tests of Rcadmcss: A Rcvinv and Evaluatim bv Rouer I'a . and 
Nicholas Anastasiow, 1971. ' '""u 

Teachers in the secondary school, whether specilically assianed to teach 
classes in reading or in other subjects, should find this collection helpful 
.ind practical, for it brings together data that would he difficult and 
tinie-consuniing to assemble. The authors have completed a task that 
ought to fac.htale reading improvement. foi the pupils. The International 
Reading Association takes pleasure in presenting this newest Readiim Aid 
to Its members and others interested in reading instruction 



Helen Muus. /V<'.wV/('«/ 
Inlernalional Reading Association 
|y()'M')70 



ERIC 



iv 



Chapter 1 



CRITERIA FOR REVIEWING TESTS* 
• Why Such A Book As This One? 

This book is intended primarily for classroom teachers and other personnel 
wlu) are directly concerned with selecting reading achievement tests. One 
may ask, "Does one really need a guide to select a test?" All readers have 
probably had a course in tests and measurements and know ti>e general 
rules for selecting an achievement test. Many, however, had the course 
before actually teaching so that theory was too removed from practice 
and. therefore, was not so useful as it could have been. But, more 
importantly, test development has made rapid advancement in theory and 
practice in recent years. 

Selecting a reading achievement test is continually becoming a more 
complex task with these advancements. Test manufacturing has become a 
large scale enterprise with attractive and highly promoted reading achieve- 
ment, assessment, and diagnostic devices. Some of these instruments are 
based on new research evidence on how children learn to read. Other tests 
are designed specifically to measure experimental programs, rather than 
the more traditional approaches. 

The computer has also made an impact on test construction. Rapid 
analyses of the statistical characteristics of a test are now possible. In the 
past It would have taken months or years to analyze the results of each 
Item on a test given to a large sample of children. Using rapid analysis 
techni({ues, the computer has enabled test manufacturers to revise their 
tests more frequently, and the revision of old tests is based on more 
accurate and complete information about the effectiveness of each test 
question. 

Old tests, however, remain in the schools long after the curriculum has 
been changed. These tests are outdated and no longer serve the purpose for 
which they were originally designed. Yet, on the other hand, some of the 
older tests still are the **best" that are curremly available. How does a 
teacher choose among them? Selecting a test takes ttme and careful 
evaluation, more time than the classroom teacher has to give from his 
other instructional duties. This monograph ts designed to review the major 
issues that should be considered before a test is chosen and used in a 
classroom. 

The authors have reviewed several of the most commonly used reading 
achievement tests currently available for use with high scliool students and 
have evaluated these instruments as to both their content and statistical 
characteristics. An analysis of the research reports from the ERIC Clearing- 

♦This cluiptcr Is a reprint, with minor adaptations, from R. I'arr and N. Ana^tasiow, 
Tests of Reading Readiness and Achievement: A Review and Evaluation, Interna- 
tional Reading Assoeiation, Newark, Delaware, 1971. 



ERIC 



1 



house on Retrieval ot Information and hvaluntion on Rendnm was con- 
sulted in an attempt to determine winch reading tests were heme used 
most often. 

These test reviews will hopefully serve as a uuide for evaluation in 
selecting the appropriate test for use in a specific classroom. This uuide 
should reduce the time normally spent in evaluating reading tests." The 
issues considered by the reviewers in evaluating the tests are the content 
measured by the test, its statistical properties, its scorabihty, the meannm 
of the subtest and total test scores, and whether the test measupis 
adequately what ti purports to measure. Although the results are sum- 
marized. It may be useful to review the purposes and uses of achievement 
tests. 

• Why Use a Commercially Prepared 
Reading Achievement Test? 

Prediction and Assessment 

One's observation of a student's daily performance is the mam source 
tor determining how well a student is doing. One will also, however, want 
to make periodic controlled assessment of each student's current rejdnm 
ability in order to place him at his appropriate instructional level. Teachers 
are aware that a student makes the most rapid progress when instruction is 
near his current level of mastery. Thus, tests help teachers make initial, 
rougli assessments so th;»» instruction can begin with a better probability 
of success. 

Teacher-made tests are one of the main sources of gathering data about 
children in a classroom. These results help one to predict future achieve- 
ment, assess how well children have accomplished the goals, and provide 
feedback to the child as well as reinforce the student for what he has 
accomplished. However useful these lesuhs may be, teachers, parents, and 
administrators are prone to want some outside assessment of how well the 
students are doing when compared with a large sample of children of the 
same age and grade. Teachers have available a limited number of children 
in a class to compare how well that class or an individual student is 
progressing. Thus, commercially prepared tests are used to provide wider 
prediction and assessment of the pupils in a class. 

There are other uses of tests besides those listed previously. A school 
district may wish to look at the general achievement level of its students. 
Tnis district assessment may help the administration make suguestions for 
program improvement, purchasing additional instructional alds^^ld equip- 
ment, or providing additional personnel. In addition, tests are used for 
research purposes to evaluate the effectiveness of a new program or to 
compare two modes of instruction. Any of the criteria to be descr;b^;d are 
relevant lor these uses of tests as well. 

• Factors to be Considered in Choosing a Test 
Special Norms 

A commercially prepared test usually offers the advanlajre of havin" 
been administered to a large number of children from a wide vaiiety of 



2 



rural and urban centers. Usually these tests have been adniuiistered lo 
children of various social, racial, and ability levels. 'Ihus, the test will have 
been "normed" on a population of children from more than just one class, 
school district, or state. A description of the norming population is critical 
for an interpretation of test scores. If one has a bright, urban class and the 
test has been normed with average, intercity children, the scores or.e's 
children obtain may indicate higher grade scores for that class than is a 
realistic assessment. If the reverse is true, that the test originally has been 
given to a large population of bright youngsters, the scores may be lowei 
than is a realistic appraisal of one's students' current status. 

It has been realized for some time now that in niany situations national 
norms are not always the most appropriate. Lately some publishers have 
started to include norms specific to a particulai geographic region or a 
particular educational reference group. This development is to be con- 
sidered fortunate. 

Standardization 

Clear, standardized directions on how the test is to be administered are 
desirable. A set of directions that are concise and uniform will ensuie that 
the results are not depressed or inllated because the directions left the 
procedure unclear. The students' scores will not be so useful if the test is 
given in a different way from the way it was given to the norming 
population. 

Objectivity 

A commercially prepared test also is intended to be objective: i.e., the 
score achieved should not b^ biased in some way by the tester or observei 
of the child's demonstration of what he knows. Encouragement, as every- 
one knows, can guide a pupil to a right answer. This is an excellent 
instructional technique as guided-discovery experiments have demon- 
strated. At times, however, one will want to know not how much a pupil 
can learn but how much he has learned and where he is now. An objective 
measure should enable one to deterniine this. As one shall see, tests vary in 
their objectivity. 

Ease of Administration and Scorability 

Given enough time and personnel, a teacher might collect extensive 
data about a child. This undertaking is not possible in most instances. 
Teachers want a test that makes reasonable demands in terms of the 
amount of time needed to administer the test so that children are not 
fatigued and also so the classroom instructional program may continue. In 
addition, tests that are difficult and tedious to score are sources of error 
and use far more teacher time than is desirable. Most achievement tests are 
designed to minimize the scoring time required of teachers. 

Validity 

A reading achievement test should sample the decoding, vocabulary, 
and comprehension skills taught. The titles of the tests should be an 
accurate description of the skills being tested. Evidence should be given 
that the skills of the test were measured with the norming population. 



3 



The test should also provide evidence that the skills measured are cither 
a measure of current status or are of predictive value. One should know 
which tests can be used to predict success or failure in subsequent 
instruction. Not all tests provide this kind of evidence. 

Three kinds of "validity" are important to consider. One is content 
validity. A test is said to have content validity to the degree its items arc 
relevant both to the subject matter taught and the behaviors which the 
teaching is aiming to produce. Second is concurrent validity which com- 
pares tlie test behavior to current performance as measured in some other 
way. I he third is predictive validity, which tells whether the score the 
cnild receives can be used to predict how well he will do in the future A 
fourth, more dillicult kind of validity, is construct validity, which refers'lo 
the psychological processes represented by the behaviors exhibited by the 
child during the test. For example, some reading tests claim that the 
comprehension skills measured on the test evaluate the child's ability to 
make inferences. Evidence .should be offered by the manufacturer that the 
test Items do measure this ability. 

Reliability 

When choosing a test one will want it to be a reliable measure of how 
much a child knows or how well he is able to apply his skills The test 
results should not be a chance score with a child obtaining a high score by 
luck, guessing, or other factors. The test .should not be constructed so that 
It gives the advantage to children who know only one thing well The test 
should be constructed so that one has confidence that the score the child 
receives today will be similar to the score he would receive if the test were 
to be readministcred to the same child tomorrow. 

The Test Manual 

It is the professional responsibility of the te.st maker to provide suffi- 
cient and appropriate evidence for the user to judge whether a test fits his 
purposes. De.script ion of administration, norniing, scoring, reliability and 
validity should be provided in the user's manual. The authors have used 
tlie test manuals to evaluate the evidence provided and to assess in what 
ways the test can be recommended for use. 

• How Can One Use Test Results? 

Most achievement tests are group tests and provide a rough assessment 
of how a child compares with the norming sample. Such tests are not 
meant to be diagnostic, nor are they meant to give an accurate assessment 
o functional reading levels. They aje a rough and ready means i,:' grouping 
children for reading instruction. The grade placement score ha', little 
instructional value. The percentile score is more useful but again r.-ouires 
careful mterpretation. !f_a test is used over a period of time, class norms 
may be bmit for a particular school district. 

One of the greater misuses of the group standardized reading tests is the 
use ot grade level norms as an indication of the level at which a student 
ought to be given reading instruction. Because of the nature of standard- 
ized tests, they are not appropriate for determining the reading level at 



4 



which the youngster can profitably receive instruction. Standardized read- 
ing tests are developed from a group of items which are administered to a 
particular norniing group; the grade norm is based on the average number 
of items that students get correct at a particular grade level. For example, a 
score of 6,0 may indicate only that a youngster who is just beginning sixth 
grade had 100 items correct. This score does not mean that the student 
who had 100 items correct can necessarily read 6.0 grade level material. 
The standardized tests were not meant to be criterion tests! 

What we are suggesting is a procedure that might be used to determine 
the level at which a youngster may be given instruction on the basis of his 
,standardized reading test score, Betts, in his 1942 book Foimdatums of 
Reading Instruction suggested three functional reading levels. These func- 
tional reading levels are based on work that he and Patrick Killgallon had 
done. Credit also is given to Thorndike for the idea. 

The three functional reading levels are as follows: I) independent 
reading level, the level at which a youngster should be doing his leisure 
time reading; 2) instructional reading level, the level at which the young- 
ster should be given reading instruction and should be learning in the 
content areas; and 3) frustration level, the reading level which is too 
difficult for the youngster and which will probably lead to negative 
conditioning to reading. 

Usually, the various levels are defined in terms of performance on an 
informal reading inventoiy] The independent level is identified by 99 
percent or better word call, -90* percent or better comprehension, and 
freedom from behavioral symptoms of tension and anxiety. The instruc- 
tional level is identified by 95 percent or better correct word call, 75 
percent or better comprehension, and only slight signs of anxiety. The 
frustration level implies 90 percent or less corrict word call, less than 75 
percent comprehension, and symptoms of nervousness, anxiety, and frus- 
tration. 

A grade level score from a standardized reading test more often than 
not places a youngster at his frustiation reading level. This relationship, of 
course, is dependent on the particular standardized test that is used and 
the particular material which is used for the informal reading inventory. 

A procedure which might be used by classroom teachers to determine 
the functional reading levels (i.e., independent, instructional, and frustra- 
tion) that correspond to various scores on the standardized tests would 
work something like this: The teacher would administer the usual stand- 
ardized test to his class. To some of his students he would then administer 
an informal reading inventory (IRI), preferably based on the basal reader 
which he was using for instruction. Youngsters to be tested with the IRI-^ 
would be selected from several points along the range of scores 'students 
achieved on the standardized tests. Students should be selected for testing 
at least from the bottom, middle, and top of the range of scores. Addi- 
tional points on the range could be sampled if time allowed. Tiie teacher 
would then determine the relationship between various raw scores on the 
standardized reading tests and the functional reading levels on the infonnal 
reading inventory. With data of this sort for several classes, he would find 
it unnecessary to readminister the informal reading inventory, being able 
to use the past performance of students to determine the levels at which 
they ought to be given instruction. 



These procedures would enable a teacher to determine a functional 
reading level that would correspond to a particular raw score on a particu- 
lar standardized reading test. For example, a student who scores 121 raw 
score points on a standardized reading test might have a fourth grade 
independent reading level, a fifth grade instructional reading level, and a 
sixth grade frustration level. Such knowledge would allow !he teacher to 
utilize the standardized test scores to place each student at the instruc 
tional reading level where he would have the greatest opportunity to 
succeed. Two cautions, however, are in order: First, it is necessary to 
obtain estimates of the reliability of the IRI to be used and, second one 
should always realize that the percentages which define the various levels 
are arbitrary. Finally, due to the unreliability of both instruments it would 
be necessary to check the various raw score **cut off points by repeatinc 
the procedure suggested previously. 



• Plan of this Reading Aid 

Each test included in this review was assessed using the followiir 
outline: ' 

I. Test Overview 

A. Title 

B. Author(s) 

C. Publisher 

D. Date of publication - original, revised 

1. Manual 

2. Test 

E. Level and Forms 

1 . Grade level 

2. Individual or group 

3. Number of forms available 

F. Administration Time 

G. Scoring - hand or machine scorable 

H. Cost 

1 . Question booklets - consumable or not 

2. Answer sheets 

3. Manual 

II. Evaluation of Subtests and Items 

A. Description of subtests 

1. Given meaningful name - describe test adequately 

2. Is each subtest long enough to provide usable results? 

3. Sequential development of each subtest logical, and transitions 
smooth? 

B. Author's purpose reflected in selection of items 

C. Scoring ease and usability of tables 

D. Directions - clarity and level of language appropriate to grade level 

E. Design - format, currentness, printing, legibility, pictures 

F. ReadabiUty 



6 



III. Evaluation of Reliability and Validity 

A, N( nning population 

1. Size 

2. Age, grade, sex 

3. Range of ability 

4. Socioeconomic level 

5. Date of administration 

B. Validity 

1. Content validity 

a. Face validity 

b. Logical or sampling validity 

2. Empirical validity 

a. Concurrent 

b. Predictive 

3. Construct validity 

a. Construct and theory of which construct is a part clearly 
defined 

b. Discriminant or convergent validity evidence 

c. Significant difference found in performance between 
groups which have varying degrees of this trait? 

4. Does reported validity appear adequate in relation to author's 
stated purpose? Why or why not? 

Each test is described, and the strengths and weaknesses are delineated 
so that one may evaluate the test one*s self. Each review was sent to the 
publisher for his reactions. In some case«, additional information was given 
the authors and this matter was included ii; the review. If the necessary 
data were not located in the manual but founc! elsewhere, the appropriate 
sources have been indicated. Finally, it should be the teacher who makes 
the final decision on the use of a test based on his program: the authors 
can only guide and suggest the criteria by which that decision might be 
made. 



• What is the Responsibility of Test Publishers? 

A test should be placed m the same category as a critical drug. A test 
should be used only af*er a careful study of its effects has been made. 
Evidence should be provided that the test (or drug) will do what it 
purports to do. Too many critical decisions are made about a child based 
on his test scores to use any test that is not a reliable and valid assessment 
of his ability lo do the task described by the test. A teacher should insist 
that the test manufacturers provide him with the same reputable product 
tl:at he would demand ol a drug manufacturer who offers a new cure. It is 
better to use no test than to use an unreliable or invalid one. One finds 
that a number of tests are released before adequate data are avaitabk*. 

Many tests have not been studied sufficiently before they are put on 
the market for sale. One hopes the reader will note these deficien»;ies and 
realise how serious the action is to make an instructional, promoticuial. or 
evaluational decision about a child when it is not based on an accurate, 
stable, or predictive measure of his achievement. 



7 



Chapter 2 



SELECTING A READING ACHIEVEMENT TEST 

In Chapter I the - haractcristics of a good reading achievement test were 
discussed. Whether a particular test is a good tesl depends on how it is 
be.ng used. For instance, some tests are fine as measures of ueneral 
achievement but should never be used for purposes of diaunosisr When 
selecting a test, the prospective test user should first ask: "What do I want 
to find out? Wha! kind of information should the test give me"'" The 
merits of a test can only be judged in terms of its proposed usage. 

Some important distinctions to be made in reuard to testing purposes 
are the following: " - i r • 

• Do I want to measure achievement at a particular time, or do I want 
to measure changes in achievement? 

• Am I primarily interested in the performance of a uroup of students 
(as an administrator might be), or is it the individ^ial student I am 
primarily inier^^sted in (as a teacher might be)? 

• Do I want to measure reading and achievement in a general sense, or 
do I have mastery of specific objectives in mind? 

• Should the test provide for comparing the performance of mv 
students with specific, clearly described norm groups? 

• Do I want diagnostic information telling me on what areas of 
instruction I should focus? 

The answers to these and similar questions will determine to what 
extent a test will be a good instrument for a particular test user. As much 
as possible the reviewers have kept these questions in mind when analyzing 
the merits ot a given test. As a footnote, it may be added that in this book 
only general reading achievement tests are included. The issue of diagnos- 
tic utility will only be raised from time to time in terms of tests with 
distinct subtests that the test publisher states have diagnostic validity. 

The test selection criteria indirectly referred to in the foregoing ques- 
tions and in Chapter I are of a basic nature. They should play a major role 
in the choice of a test. Quite often, however, more than one test satisfies 
tne basic criteria a test user has in mind, but there are other important 
points to consider, many of a very practical nature. 

• How easily can the test be scored; is handscoring feasible: h()w long 
will It take to get machine-scorings returned? 

• How long is the test? Is technical quality, such as a high reliability 
coefficient, achieved by an unreasonably large number of items? 

• What is the risk that students will perform badly because of lack of 
clarity in format and directions? 

• What does the test cost? 

Practical issues such as these are fat from trivial. They must, however 
never outweigh more basic criteria about the fitness of a given test for the' 
particular test purpose under consideration. 



8 



Chapter 3 



TEST REVIEWS 



• California Achievement Tests: Reading 



Reviewed by Eugene Jongsnia 

Louisiana State University at New Orleans 



Name of Test 
California Achicvcnicni 
Tcm: Reading 



Subtest 
I'ocahuhry 

Rcadhtfi Comprchvnsion 

Authors 
l:rnc.\t W. "ficcx 
WillixW. Cbric 



Publication Date 
1957 



Revision Date 
1970 



Publisher 

California T;;si 
Burcau/McGraw-Hill 



Time 

50 Minutes 

Overview 

The Reading section of the California Achievement Test (CAT-70) repre- 
sents an extensive effort at revising and renorming the 1957 ediiion of this 
test. Over 85 percent of the test is composed of new items. Level 4 is 
designed for grades 6 througli 9, and level 5, for 'grades 9 througli 12. Two 
forins of the test. A and B, are available at each level. 

At levels 4 and 5. the test consists of a vocabulary section of 40 items 
( 10 minutes) and a comprehension section of 45 items (40 tninutes). Three 
scores arc available - vocabulary, comprehension, and total. Raw scores 
may be converted to grade equivalents, percentile ranks, stanines. or 
Achievetnent Development Scale Scores (ADSS). The ADSS is a scale of 
standard scores, ranging from 100 to 900, which is appropriate for making 
comparisons across forms and across levels of the test. The publishers 
wisely caution test users on the use of grade equivalent scores. In addition 
to the conventional hand scoring. Scoreze - a high-speed, self-scoring hand 
system - is available, as well as tnachine-scorable forms (CompuScan, IBM 
1230, and Digitek). 

Assistant superintendents, readitig consultants, and other administrative 
personnel charged with the responsibility of directing a large-scale testing 
program would want to cotisult the Test Coordinator's Handbook atid 
Bulletin of Technical Data Number /, which contain additional inforn'a- 
lion pertaining to the developtnent, standardization, adtninistration, and 
interpretation of the test. 



The norms for this test were based on a nationwide sample of about 
200,000 students representitig both public and private schools. A total of 



Norms 



9 



3^)7 schools was included in tlic sample, Tlie standardization sample was 
stratitied on three factors: 1) geographic region (seven districts), 2) averaue 
enrollment per grade (small, medium, large), and 5) type of community 
(urban, rural, town, and other). The school districts participating in the 
standardization are listed in the Test Coordinator's Handbook 

No description is given of the norm group other than the three factors 
listed above. It would have been helpful to characterize the norm group in 
terms of sex, intelligence,' socioeconomic status, or other demographic 
variables. As the information now exists, test users will be unable to 
determine to what extent their particular populations correspond with the 
standardization group. According to the publisher, follow-up question- 
naires were used with each school to obtain descriptive information. These 
additional data are t«) be forthcoming in the Technical Report. 

In the Test Coordinator's Handbook, the publisher encourages the use 
of local norms, especially for interpreting individual scores. Test users 
would be wise to follow this procedure, particularly if the local population 
differs substantially from the standardization group. Directions for prepar- 
ing local norms are to be forthcoming in the Technical Report. 

Reliability 

Reliability data were obtained on a sample of 550 to 400 students per 
grade level from the standardization group referred to in the **norms" 
section. The sample of students used in the reliability studv was drawn by 
systematic sampling so as to be representative of the norm group popula- 
tion. The data, provided in raw score form, include means, standard 
deviations, Kuder-Richardson-20 estimates, and standard errors of meas- 
urement for each grade and each level of the test. 

The KR-20 estimates are fairly high, ranging from coefficients of .84 to 
,95, and reflect a high degree of internal consistency within the test. While 
desirable, internal consistency estimates, such as KR.20, ignore response 
variability of subjects and varying effects of testing conditions. No other 
reliability information is provided. Notably lacking are test-retest and 
parallel-form estimates. The publisher, however, doe.s promise additional 
reliability data in the future Technical Report. 

Test users should pay particular attention to the standard errors of 
measurement which are provided. They are useful in setting confidence 
limits around an individual's^Mrue" score. 

Validity 

Content validity is reportedly based on a nationwide review of widely 
used reading tests and a study of curricular objectives and courses of study 
from a cross-section of states. However, the specific materials surveyed, 
their vintage, and the qualifications of the reviewers are not cited. A 
summary of reading objectives is presented which reads like most tradi- 
tional reading methods textbooks. Test users should match their own 
mstructional objectives with the actual tests to decide if the tests are 
appropriate evaluative instruments for the particular instructional pro- 
grams. This writer believes that many of the objectives cited by the 
pubhsher are not measured by the test, e.g„ "ability to read materials in 
specialized content areas, adjusting rate and concentration to the purpose 



10 



of their reading " Indeed, one wonders whelUvr the intended purpose 

of the overall test, "to measure progress in reading gained from various 
methods of instruction," is feasible or even desirable. No other type of 
validity, other than content validity, is reported. 



Evaluation of Subtests and Items 

The Vocabulary section consists of 40 items and is to be administered 
in 10 minutes. A stimulus word is presented in context, albeit limited, and 
the student must choose the best synonym from among four choices. The 
stimulus words represent various parts of speech and come from a range of 
subject areas. 

The Comprehension section consists of 45 items and is to he adinhiis- 
tered in 40 minutes. For both levels 4 and 5, the first six items are 
intended to measure ability to use reference or study skills. While this is a 
valid aspect of reading instruction at the junior and senior high school 
level, six items are too few upon which to make an accurate and reliable 
judgment of a student's ability in this area. 

The remaining portion of the Comprehension section consists of pas- 
sages, arratiged in an ascending order of difficulty, representing various 
types of materials - science, social studies, mathematics, and general. The 
passages are followed by four or five option multiple-choice items. Some 
vocabulary- type items are intersper.sed hi the Comprehension section. 
Many of the comprehension items are specifically tied to the passage by 
the use of stems, such as "In this article, the writer's purpose is . . . and 
"the purpose of the third paragraph is , . . This tends to make items 
reading-dependent and serves to relieve a problem that has plagued the 
measurement of reading comprehension for years. 

The Technical Report explains rather thoroughly how the time Uniiis 
were developed. The time allowed should be adequate for most students. 
The test should not be considered "speeded" except for a very small 
number of students who are experiencing extreme reading difficulty. 



Summary 

Complete judgment regarding the Reading .section of the newly devel- 
oped California Achievement Test {CAT-70) will need to be reserved until 
more complete technical data become available. The information still 
needed includes: 1) a description of the materials and process used in 
developing the test objectives and content; 2) additional reliability data, 
particularly parallel-form and test-retest estimates; and directions for 
preparing local norms. 

In spite of the limitations cited in this review, this newly developed test 
repre.sents a marked improvement over the older editions. Features such as 
the presentation of vocabuhiry words in context, the use of content area 
passages, a greater number of reading-dependent questions, and a more 
thorough discussion of development and standardization procedures add 
to the test's usefulness. The test is best suited as a survey niea.sure of a 
junior or senior high school student's general reading abihty. 



ERIC 



11 



Cooperative English Tests - Reading Section 



Reviewed by V. Michael Laliey 
Virginia Commonwealth University 

Name of Test 

Cooperjiive Knglivh Tcviv - 

Reading Section 
Revision Date 
I960 



Subtests 

Vocabulary 
Comprehension 
Author 

C. Derrick 

D. P. Harris 
B. Walker 



Publication Date 
1940 

Publisher 
HducationjI 
Te.vlini: Service 



Time 
40 



Overview 

The Cooperative English Tests represent a I960 revision and restandardiza- 
tion of older tests by the same name. These new tests measure achieve- 
ment m two areas, readmg and written expression. This review however 
will concern itself only ,vith the reading tests. 

The current tests are designed to replace the old forms R, RX Y and Z 
(Higher Level) and R, RX, T, Y, and Z (uower Level). The newer tests are 
prepared at two levels with three forms at each level. High school students 
froni grades 9 through 12 usually use one of the three forms at level two 
listed as 2A, 2B, and 2C, For above average students in Grade twelve it is 
suggested that form I A or IB should be used. These two forms are 
suggested for average college freshmen and sophomores. Form IC also a 
college level test, is reserved for use only with admitted college freshmen 
and sophomdres. The test manuals suggest that higher or lower level forms 
Should be used with above or below average students. 

Although the use of percentile bands for score reporting is more 
accurate than for grade equivalents, insufficient attention is devoted to the 
problem of score interpretation of very high or very low scores. Students 
in either of these two groups on any test are not accurately measured Of 
particular concern are those whose scores place them in the very low 
group A chance score (25 percent correct for four-choice, multiple-choice 
Items) for the 60 items in the Vocabulary or Speerf of Comprehension 
subtests would be 15 items correct. Taking the Vocabulary subtest as an 
example, this score results in a 2 throiigli 15 percentile band in the ninth 
grade norm table and a .<) through 4 pe.centile band in the collene 
sophomore norm table. This problem should be more forcefully presente'd 
than in the current set of manuals. Perhaps a mark or a line indicating the 
level of chance scores with some mention of the unreliability of scores 
below this level would be desirable. 

Scoring can be done by machine (specific directions are included) or by 
hand. Hand scoring is accomplished by the use of overlays. Tables in the 
Manual for Interpreting Scores provide converted scores from their nw 
score equivalent. The converted scores are then used to find theappropri- 
ate percentile band in the norm table corresponding to the studcnt^s grade 
level. With the exception of college freshmen, all norm testing was done in 



12 



the spring. College freshnien were tested in the rail. The Manual for 
Interpreting Scores provides a table which suggests the appropriate norm 
table for use when testing does not occur at the same time of year used in 
establishing the norms. 

The significance of score differences is very simply found. If the two 
percentile bands being compared do not share at least one common 
percentile score, then their differences are significant. 

The three manuals. Directions for Administering and Scoring, Manual 
for Interpreting Scores, and Technical Report, are all well done. There is 
no indication of any revision being made to any of the manuals since their 
l%0 publication dale. Perhaps it would be in order to revise the section 
on validity in view of the many validation studies on this test conducted 
since l%0. 

Norms 

The tests were normcd on both high school and college students. On 
the high school level norms were prepared for grades 9 through 12. The 
number of students included in the norniing sample is not indicated. 
However, all students in grades 9 through 12 in the cooperating schools 
were tested. Slightly more than half of the 160 schools randomly selected 
produced usable results. Lack of cooperation or administrative errors 
removed the remainder from consideration. The manual points out the 
Southern stales are overrepresented while the New England and Middle 
Atlantic states are underrepresented. An examination of the schools listed 
as participants would seem to indicate a greater number of rural or small 
town schools and a corresponding lack of schools in large metropolitan 
areas. 

A 10 percent sample of the schools in the norm group was given the 
School and College Ability Tests to obtain an eslimate of the verbal ability 
of the sample. Both groups had the same standard deviation and had mean 
scores within four converied-score points. The manner in which the 10 
percent sample of schools was selected was not mentioned. 

In norniing the college level tests an effort was made to select a sample 
representative of the United Slates in regard to region and type of college. 
Colleges were divided into three geographic regions - North, South, and 
West. They were also divided into two levels: I) liberal arts colleges and 
universities and 2) teachers' colleges, junior colleges, and technological 
schools. Originally 150 schools were contacted. Of these, 1 30 agreed to 
cooperate, but only 105 returned usable data. Only 78 colleges returned 
sophomore test data. 

The authors recommend the development of local norms as being more 
informative. A complete, step-by-step procedure for establishing local 
norms is provided m the Manual for Interpreting Scores. 

Reliability 

Correlation with alternate forms was the method used to assess reliabil- 
ity. Two different forms were administered within a one-week period. By 
rotating the forms administered, it was possible to obtain reliability 
coefficients. Correlations of each form with the other two forms were 
averaged. Total Reading reliabilities ranged from .91 to /)4. On the 



13 



subtests of the various forms and levels correlations ranged from .71 to 
.89. The highest reliabilities of the subtests were consistently found to be 
on the Vocabulary subtest, .87 to .89. The lowest reliabilities were found 
on the Level of Comprehension subtest, .71 to .78, which is also the 
shortest subtest. The Speed of Comprehension subtests had reliability 
ranges from .81 to .87. Split half reliabilities were not computed, perhaps 
because a speed test contributed 40 percent to the total score. The test 
reliability would seem to be good for this test. 

Validity 

Validity is always a difficult topic to deal with. The authors feel that 
content validity is best assured by relying on well-qualified people to 
construct the tests. The authors further state that the validity of the 
revised tests should not be expected to be greatly different from the earlier 
versions. Only one study of the validity of the test is reported in the 
manuals while several studies are presented giving correlations of the 
earlier forms of the Cooperative English Tests with various criteria. 

Evaluation of Subtests 

Total Reading Score is derived from three scorings of two subtests. The 
60 items on the Vocabulary* subtest provide one score. The 60 items on 
the Comprehension subtest are scored in two ways to give two additional 
scores. The first 30 items are scored separately to provide a Level of 
Comprehension score while the total 60 items are scored to provide a 
Speed of Comprehension score. 

The strategy behind the split score on the comprehension section is that 
the first 30 items could be completed by all students, making it a power 
test. On the other hand, the time limit for the full 60 items is such that 
few students can be expected to finish. This section then becomes a speed 
test, ^ 

The Vocabulary subtest presents a word in isolation followed by four 
choices. In view of recent linguistic findings that words derive their 
meanings from context, it would seem more desirable to use a context 
stem instead of a solitary word. Reading does not require the interpreta- 
tion of words in isolation, but rather the ability to provide differing 
meanings for the same word in different contexts. The Comprehension 
subtest has a problem common to all tests of this type. Do they really 
measure reading comprehension? Several sKills in addition to comprehen- 
sion confound the scores. The ability to skim back over the item and 
locate information quickly is as important as the ability to understand or 
comprehend. 

The directions are the same for all forms of the subtests so that both 
levels or all forms may be given at the same time, A trained test adminis- 
trator is not required so long as the clear and concise directions are 
carefully followed. 

Summary 

The Cooperative English Tests measure achievement of high school and 
college students in reading and written expression. In general, the tests 



14 



have been carefully constructed and provide a realistic means of inter- 
preting scores. 

Norniing was conducted in the spring for high school students and 
college sophomores and in the fall for college freshmen. Some regional 
biases are reported, and some bias toward si/.e of community is evident. 
However, results are close to those obtained on the School and College 
Ability Tests typically given to high school seniors. The reliability coeffi- 
cients are satisfactorily high. v ^ 

Reporting scores by the use of percentile bands rather than scores is an 
excellent idea and undoubtedly aids interpretation. It may prove con- 
fusing» however, to think of the percentile band as a single score as is done 
in the Manual for Interpreting Scores. A score falls in one place, but a 
band covers a wide range of scores and cannot be thought of as a single 
score. 

This test, in spite of a few weaknesses, would seem to be a good test for 
measuring reading achievement of high school and college students. 



• Davis Reading Test 

Reviewed by Robert L Schreiner 
University of Minnesota 



Name of Test 
Davis Reading Test 

Revision Date 
1961 

Time 
40 



Subtests 

Level of Comprehension 
Speed of Comprehension 

Authors 
I'. B. [)aviv 
C. C. Daviv 



Publication Date 
1957 

Publisher 

PvycliologicMl 

Corporation 



Overview 

The Davis Reading Test w?s originally published in 1957 and revised in 
1961 by the Psychological Corporation. There are two levels of the test. 
Series 1, is for average to above average pupils m grades I I and 12 and for 
college freshmen. Series 2 is for pupils in grades 8, 9, 10, and 1 1. There are 
four parallel forms at each of the two levels of the test; hence, it may be 
used for repeated measures of secondary level reading ability. 

Each series provides two separate but related scores. Tlic Level of 
Comprehension subtest consists of the first 40 items of the test. The score 
on this subtest indicates the depth of understanding displayed by a pupil 
m reading materials ordinarily required for high school or college success. 
The second subtest. Speed of Comprehension, consisting of the first 40 
plus the remainder of the items* is designed to determine how rapidly and 
accurately pupils must read and understand material in order to achieve 
academic success. 

The authors state that the test is designed to measure the following five 
reading comprehension skills: I) locating explicit or paraphrased informa- 
tion; 2) assimilating specific thoughts within a passage to grasp the central 



15 



or ninin thought; 3) determining inferences legardinga passage and/or the 
author's purpose and his point of view: 4) recognizing tot e, mood, and 
other hterary devices used by an author: and 5) following the structure of 
a passage. 

Pupils are allowed 40 minutes to complete the test. The actual testing 
session may take from 45 to 55 minutes. This feature makes the test quite 
practical for most secondary reading class periods. The examinee needs a 
test booklet, pencil, and an answer sheet to use the test. 

The test can be scored either by hand or machine. Whichever procedure 
IS used, the raw scores obtained by the pupils are corrected for guessing by 
using the following scoring formula: number right minus one-fourth the 
number wrong. The authors indicate that the purpose of the correction 
fornmla is to discourage wild guessing and to prevent sophisticated exam- 
inees from having an advantage over cautious or naive examinees in the 
Speed of Comprehension subtest. Pupils are told to omit questions rather 
than to guess wildly, but they are not told about the use of a formula to 
adjust the raw score for guessing. 

The directions for the administration of the Davis Test are adequate 
and clear. Very little, if any, training is required to administer the test. The 
examiner's manual is extremely well organized. One problem, ihis reviewer 
noticed, dealt with the application of the guessing formula. The authors 
suggest that another person check '^ach pupil's score for accuracy of 
calculation. This may be an impractical suggestion. 

The reading passages cover a variety of subject matter areas. The 
content of the passages appears to be interesting and to reflect caief.ul 
choices in order to represent all the subject matter areas appropriate for 
secondary pupils.. The subtest Speed of Comprehension is a different way 
ot measuring reading speed. The Speed score is determined by the total 
number of items correct for the entire test. While this procedure may be 
meaningful from a logical analysis point of view, it does not appear to be 
diagnostically meaningful. If a pupil shows weaknesses in speed of compre- 
hension, it is not clear from his performance on the test what the teacher 
should do about this particular deficit. One thing seems clear: traditional 
speeded reading activities, those skills frequently taught inmost secondary 
reading programs, would not be sufficient to improve a pupil's speed of 
comprehension score on the Davis Test. This skill seems to indicate that 
pupils should be provided with a purpose prior to reading each passage. Of 
course, purpose setting does not occur during the testing session. 

Tables are provided in the examiner*s manual so that raw scores may be 
converted into scaled scores. This procedure allows the consumer to make 
either group or individual comparisons. Hence, the teacher may be able tc 
determine growth in reading ability for a large group of pupils or permit 
comparisons of growth of individual pupils, it also facilitates the interpre- 
tation of different scores between the separate subtests of the reading test. 

The authors' purposes for the selection of the items of the test are 
presented in the examiner's manual and are based on their careful research 
and survey of the literature related to reading comprehension skills. The 
authors initially located over a hunuied reading comprehension skills and 
indicated that many comprehension skills are related to mechanics of 
reading while others are incapable of measurement by objective test items. 
The original list of one hundred was reduced to nine clusters, called 



ERLC 



16 



operational skills, employed in comprehension m reading. The first two of 
the skills are related to verbal aptitude. Because of the pervasive nature of 
word memory, it was decided not to provide a separate measure of 
vocabulary in the Davis Tests. The remainder was grouped into the live 
categories previously mentioned. The following list provides the approxi- 
mate number of items per grouping for each of the four forms of the test: 
I) locating specific information (20), 2) locating central thought (20), 3) 
determining inferences and authors' purposes (20), 4) finding literary 
techniques (10), and 5) following passage structure (10). Originally, 650 
items were constructed to measure these five reading comprehension skills. 
These items were then administered, with an unlimited amount of time, to 
pupils to determine which items were suitable. Finally, 320 items were 
selected for the construction of the four forms of Series 1. A similar 
procedure was used to select items for Series 2. 

A teacher may wish to examine the pupil response patterns on each of 
the categories upon which the lest is constructed. The skill category for 
each of the items on both forms of the Davis Reading Tests may be 
obtained by writing to the Psychological Corporation. 

Norms 

Percentile norms for each of the subtests of the Davis Test aie provided 
in the examiner's manual. The norms for the Series 1 tests are based on the 
testing in 1957 of over 18,000 pupils in grades II, 12, and college 
freshmen. Eighteen colleges and universities administered the tests to their 
entering freshmen, and 29 high schools contributed norm samples for 
grades 1 1 and 12, These institutions were well distributed geographically. 
However, it appears that the high school sample could have included 
schools of medium or small size and pupils from lower socioeconomic 
sections of the population. Percentile norms for Series 2 (grades 8 to 1 1) 
are provided based on test results administered to 21,000 pupils in 1%1. 
Fifty-two high schools contributed norm samples for these grades levels. 
Again, this norm sample was drawn from larger metropolitan areas with a 
smaller percentage of pupils from lower socioeconomic areas than might 
be desirable. 

Reliability 

Reliability coefficients for Series I of the Davis Tert were based on the 
administration of two equivalent forms of the test with a three-week 
int^^rval between administrations for pupils in grades 1 1 and 12 and on a 
one-wee!c interval for college freshmen. Reliability coefficients for Series 2 
forms were based on the administration of two forms of the test with two 
to four weeks between administrations for grades 8 througli 1 1 , 

The average reliability coefficients reported for Series 2, Level of 
Comprehension subtest, are ,84 (grade 8), .84 (grade 9), .78 (grade 10), 
and ,77 (g.-ade 1 1). For Series 1 they -re .74 (grade II), ,77 (grade 12), 
and .80 (college freshmen). 

The average reliability coefficients reported for the Speed of Compre- 
hension subtests, Series 1, are .91 (grade 8), .91 (grade 9), .86 (grade 10), 
and .86 (grade 1 1). For Series 2 they are .84 (grade 1 1), .85 (grade 12), 
and .88 (college freshmen). These reliability coefficients compare favor- 
ably with other current measures of reading ability. 



ERIC 



17 



Validity 

The Davis Reading Test was designed, by careful analysis, to measure 
the tive reading comprehension skills previously mentioned. The authors 
refer the user to the results of several factorial studies that substantiate 
these same reading comprehension skills. They indicate that the I evel of 
Comprehension score, based on the first 40 items, measures only accuracy 
or depth of comprehension. They further indicate the pupils who do not 
complete the first 40 items are likely to be reading disability cases and that 
Uiese pupils should be referred for individual diagnosis and remedial work 
The Speed of Comprehension subtest is designed to measure both accuracy 
and depth of understanding plus speed of reading. The pupil's score on all 
«U Items IS used to determine Speed of Comprehension. 

Upon inspection of some of the items, one finds many detailed ques- 
tjons that appear to hinder speed. Some of the questions seem to require 
the pupil to retain a substantial number of details to obtain the correct 
answer. Frequently, this reviewer had to return to the reading passage and 
count objects or events in order to answer a question. For example "How 
many plants gotten by Europe from America are mentioned in the pas- 
sage? a) eleven, b) eight, c) seven, d) four, or e) six." 

Data are presented to indicate the predictive usefulness of the Davis 
lest. The test was admmistered to a sample of over I JOO college students. 
The results of the test were compared to the obtained first semester 
tnghsh course grade. Correlation coefficients between these two measures 
were obtained, and they ranged from .36 to .57 with a median of 46 
bach pupil s first semester grade point average was also compared with his 
performance on the reading test. These coefficients ranged from 37 to 
..>6 with a median of .46. Data on the predictive validity of the test at the 
higli school level are also presented. Individual pupil performance at grades 
I I and 12 on the Davis Test was compared to the grade achieved in 
hnglish. These correlation coefficients ranged from .15 to .64 with a 
median of approximately .46. 

Correlation data are presented comparing the results of pupil perform- 
ance on the Davis Reading Test and other similar tests. The following 
correlation coefficients between the Davis Test and other tests are re- 
ported: I) STEP Reading, .76 (college freshmen); 2) DAT Verbal Reason- 
ing, .76; 3) ITBS Reading, .77 (Grade 8); 4) COOP Reading Comprehen- 
sion, .75 (Grade 10); and 5) ITED Composite, .77 (Grade 10). 

Evaluation of subtests 

In order for a standardized test of reading ability to be truly useful it 
must provide diagnostically relevant information to the test consumer. The 
subtest scores that the pupils receive must give some clues so that teachers 
may be provided with appropriate teaching or remedial alternatives. For 
example, if a pupil receives a substantially lower scaled score on the / evel 
of apprehension subtest than on the Speed of Comprehension subtest of 
the Davis Test, what mformation does this result provide to the teacher'^ 
What teaching methods and materials must the teacher now employ to 
help the pupil increase his level of reading comprehension? This kind of 
information is not provided by the authors. 

A similar situation exists if a pupiPs score Is lower on the Speed of 
Comprehension subtest. The authors take great pains to distinguish be- 



ERIC 



18 



tween reading speed as traditionahy defined and speed of comprehension, 
i.e., rapid reading with thorough understanding. However, how does one 
teach pupils to read rapidly with increased accuracy? The authors do not 
provide suggestions. 

Most secondary reading programs provide systematic instruction for 
pupils aimed at the improvement of reading speed measured in words per 
minute, level of vocabulary, and comprehension skills. The kinds of 
instructional activities associated with reading iiiiprovement programs at 
the secondary levels do not appear to be directly measured by the Davis 
test. Reading speed, it seems to this reviewer, is more appropriately 
measured if passages of greater length are used that deal with a variety of 
disciplines represented within the high school curriculum, followed by a 
series of comprehension questions. Secondary pupils rarely read short 
passages as used in the Davis Tests and then answer five to six multiple- 
choice questions about them. 

Summary 

The authors indicate that accuracy of reading comprehension is much 
more dependent on ability to associate word meanings correctly than any 
other mental ability. No specific measure of vocabulary ability, however, is 
included as a part of the iest. The authors indicate that memory of word 
meanings is so pervasive an ability that it was decided not to measure it 
separately. One could argue that the processes underlying reading compre- 
hension are even more pervasive an-i, therefore, they defy measurement. 
This reviewer feels that some assessment of a pupil's vocabulary ability is 
important diagnostic information for the secondary reading teacher. 

The skills assessed by the Davis Reading Test appear to be globally 
defined by the authors and, hrnce, are of limited diagnostic use to the 
teacher of secondary reading. The test is extremely well constructed and 
exemplary m all aspects of treatment of statistical data. This reviewer 
would suggest, however, that the test be used at the secondary level as a 
survey measure to assess general reading ability. It appears to have limited 
diagnostic value. 

• Diagnostic Reading Tests 

Reviewed by Eugene Jongsma 

Louisiana State University at New Orleans 



Name of Test 
DijgiiostiL* Reading Tests 

Revision Date 
Varies with subtests 

Time 

Varies with subtests 



Subtests 

Surviy Section 
Diagnostic Battery 

Authors 
Committee on 
Diagnostic Reading Tests 



Publication Date 
1947 

Publisher 
Committee on 
Diagnostic Reading Tests 



Overview 

The Diagnostic Reading Tests are the product of an ambitious effort by 



ERIC 



19 



the Coininittee on Diagnostic Reading Tests to develop a survey and 
diagnostic battery appropriate for students from kindergarten through the 
college freshman year. The various components of the test battery are 
outlined below, ^ 

A, Survey Section: Upper Level (Forms A througli H) 
K Rate of Reading 

2, Vocabulary 

3» Comprehension 

4, Total Comprehension 

B, The Diagnostic Battery (F-orms A and B) 
Section I: Vocabulary 

K English 

2, Mathematics 

3, Science 

4, Social Studies 

5, Total 
Section 11: Comprehension 

Part K Silent 

Part 1. Auditory 
Section III; Rates of Reading 
Section IV: Word Attack 

Part K Oral 

Part 2. Silent 

The Survey Section is intended to be used as an independent measure 
of general reading ability or as a screening instrument to identify students 
in need of further diagnostic testing. The multiple sections of the Diaguoi 
tic Battery are designed to provide a more specific assessment of the 
various aspects of the reading process* i.e., vocabulary, comprehension, 
rate, and word attack. Each section of the battery conies as a separate test 
booklet with a separate set of directions. Raw scores may only be con- 
verted to percentile ranks as the committee is fimily opposed to the use of 
grade norms or grade equivalent scores. 

Norms 

Except for Section H Silent and Auditory* Comprehension, all norms 
are mchjded in a separate booklet of consolidated norm tables entitled The 
Diag)ios:ic Reading Tests: Nonns (mi). This booklet must be ordered in 
addition to the tests. 

The normative data presented by the committee are woefully inade- 
quate for a number of reasons. First, one set of norms is provided for all 
forms of each section or subtest. The publisher justifies this lack by 
claiming that the forms of each subtest are equivalent. Yet* no proof of 
equivalency is given. Second, many of the subtest norms are extremely 
outdated; some are twenty years old. Third* on some subtests* particularly 
certam sections of the Diagnostic Battery, the norms are based on too 
small a population to be representative or reliable. Last* and most impor- 
tant » a test user has no information as to the kinds of groups on which the 
tests we-e standardized. Thus* it is virtually meaningless for a test user to 
interpret his students* results with the norms provided. In the norm 
booklet mentioned* the committee states* *The characteristics of this 



20 



(statidyrd!/.alion) population nuisl be defined so -hat a norm group may be 
selected which is similar to the particular ^roup being tested" (p. 5). Yet, 
no information concerning the aptitude, ability, socioeconomic status, or 
geographic origins of the norm group is provided by the committee. 

Careful examination of the norm tables reveals another deficiency. It 
appears that some sections of the battery are too difficult for seventh 
graders and possibly for some eighth graders; consequently^ the sections do 
not provide an appropriate level assessment of their reading abilities. For 
example, a seventh grader receiving a chance score (a raw score "of 10 out 
of 50, five option items) on tlie social studies part of the Section I: 
Vocabulary Test would rank at the 44th percentile for insgr:»dc level. This 
example clearly shov.'s that this particular subtest is not adequately meas- 
uring the vocabulary skills of most seventh graders. 

In light of the inadequacies identified in this section, test urers would 
be wise to bypass the available norms and accept the committee's offer to 
compute their local norms without charge. 

Reliability 

The reliability data, also presented in Vie Diagrnstic Reading Tests: 
NormSt are insufficient. A median coefficient based on the Kuder- 
Richardson-2 1 procedure is presented for each subtest and test totals. 
Although the coefficients are fairly high, ranging from .80 to .97, it is not 
/'ear just what has been "averaged.*' It is also likely that the KR-21 
estimates are inappropriate for some subtests on whicli speed is a factor. 
As with the norms, the group used to obtain the reliability estimates is not 
described. The size of this sample is not even given. The reliability 
estimates that arc presented are meant to apply to all forms of the test. No 
estimate of alternate-form reliability is available. 

One subtest with questionable reliability is Section III: Rates of Read- 
hig Test, The publisher does, however, caution that if this test is to be used 
in individual diagnosis, both forms should be given, one immediately 
following the other. 

Validity 

Perhaps the greatest technical weakness of the Diagnostic Reading Tests 
is the complete lack of evidence pertaining to their validity. The test 
battery is promoted and sold as a "diagnostic*' instrument yet no data are 
provided which demonstrate the independence of the subtests. Subtest 
intercorrelations aie lacking. 

Perhaps more fundamental is the absence of a rationale for construction 
of the test. There does not seem to have been a systematic plan for 
developing the test objectives, choosing tiie test material, and selecting the 
test items. At least no such scheme is explicitly stated. 

Instead of providing the necessary validity information in the test 
manuals, the publisher has shifted the burden of proof to the test con- 
sumer by encouraging him to locate and read research related to the use of 
the Diagnostic Reading Tests. Althougli it is desirable for tesi users to 
familiarize themselves with such literature, the test publisher has a definite 
responsibility W p'OviJe sufficient information that will allow a test user 
to make accurate judgments concerning the usefuln^jss and interpretation 



ERIC 



21 



of the test, according to ihc Standards for luhtcar tonal and Psychological 
Tests and Manuals, 

Evaluation of Subtests and Items 

The Survey Section is designed prinianly as a screenini: instrument and 
as such yields the following scores: \) Rate oj Reading, 2) Vocabulary, 
Comprehension, and 4) Total Comprehension, The rate sul)test niea'siiies 
the student's usual rale of reading story-type material. Twenty multiple- 
choice comprehension questions follow the passage but do not vield a 
separate score. The Vocabidary subtest consists ofdO items and is to i)e 
completed in 10 minutes. Definitions are given in context. and the student 
must select from among five the one word which lias been defined. The 
Comprehension subtest is comprtsed of 4 pa.ssages and 20 multiple-choice 
Items which must be completed in 15 minmes. The items appear to 
measure a variety of skills. The Total Comprehension score is a misnomer 
since 60 of the 100 items upon which this score is based are vc.jbulary 
Items. It should be considered a "total test" score instead. 

The Section I: Vocabulary test is tiiade up of four ^ectlons - linghsh. 
Mathematics, Science, and Social Studies. l:ach sectton contains 50 items 
of ;';e same format used in the Vocabulary ^iiUicM oi lhc Survey Section. 
In fact, several of the items are identical to those used in the Survey test. 
Time is likely to be an important factor for many students on this test. 

The Section II: Comprehension test consists of 16 passages :ind 50 
multiple-choice comprehension items. The test may be adininiMered as a 
silent comprehension test or as an auditory or li.stening comprehension 
test. No norms are available, however, for the latter procedure. 

The Section III: Rates of Reading test consists of two passaues followed 
by multiple-choice questions. On the first passage, students are directed lo 
read at their "regular" rates while on the second passage they are encour- 
aged to read as rapidly as possible without a loss in understanding. No 
percentile norms are available for the comprehension aspect of this test. 

The Section IV: Word Attack test is composed i)f oral and silent parts. 
For the oral part, which must be adniinisiered indtviduallv, the student 
reads a series of passages which gradually incre::.e in difUciiltv while the 
test administrator records substitutions, omissions, and repetitions. Tins 
work is followed by the pronunciation of isolated words in a series of 
three word lists. The silem pari of the test measures the students' ahiliites 
to match wnrds that have similar sounds and the abilities to recogni/.e the 
numbers of syllables in selected words. 

Summary 

The Dtagnostic Reading Tests represent an extensive balterv of sub- 
tests, each with its own test booklet and its own manual of directions. 
Unfortunately, some of the tnforniation presented in the manuals is 
conflicting and often misleading, creating a complex and often confusing 
array of test materials. This problem could be allevtated somewhat by 
consolidating the directions for the variiuts parts of the diagnostic battery 
into one booklet. 

Technically, the tests leave much to be desired. No evtdence ol validity 
is provided. The reliability data are insufficient, and u.seful descriptive 



ERIC 



22 



intorniation, such as means, standard deviations, and standard errors of 
nicasurcnicnt for the norm groups, is not available. 

Tlic test is belter suited for high school students than junior high 
pupils. The Simey Section may be used as a measure of general reading 
ability, hut the diagnostic utility of the battery is doubtful." 

In light of the inadequacies and limitaliotis raised in this review, te.st 
consumers would do well to consider other tests on the market which 
offer more substantial and comprehensive technical data. 

In a real sense the Committee on Diagnostic Reading Tests is a victim of 
the times. Started in the l^MOsas a noble effort by ;i handful of dedicated 
profc.s.sionals, it now finds itself competing with tlie vast resources and 
techtiical expertise oi the test giants which Iiavo emerged in the field of 
measurement. 



• Gates-MacGinitie Reading Tests 

Reviewed by William t. Blanton 
Indiana University 

Name of Test 

(».i(cvM,icCiinitic RtMdins: IcnIn 

.Arthur I.. Cijicn 
Waller II. M.ic(;inilic 

Overview 

The Gntes-MacGmitie Reading Tests represent a new edition .standard t/cd 
in l*)()5 and M)(>*). This series replaces wieCiates Primary and .Advanced 
Prmiary Reading Te;ts and the Gate.s Reading Survey, included in this 
eight-test series are tests for grade levels from kindergarten through grade 
12. The purpose of this review is to cover Survey I! for grades 7 thnnigh ^) 
and Survey F for grades 10 through 12. 

Survey I: and Survey V consist of three subtests: Speed and Accuracy, 
Vocabulary, and Comprehension. Three forms are availai)le for Survey I*.: 
two forms are available for Survey V. IJoth surveys arc also availal)le ni 
hand-scored or machine-.scored editions. 

The authors are very careful to assisi test users in making correct 
interpretations of test results. The Technical Supplement provides tables 
for converting raw scores into standard .scores and percen files, inlerpretmg 
differences between subtest scores, and eslimating reading lesi gains. 
Norms for the beginning, middle, :uid end of eacli grade level are also 
provided. 

The significance for differences belween .subtest scores is determined on 
the basis of probability. If differences belween two subiest scores occur 
more than fifteen times out of a hundred, the obtained scores should be 
used in planning reading instruction. Similarly, differences m readuig tes! 
gains are con.sidered .significant when they occur more than fifleen times 
out of a hundred. 



Suhto\tN Puhtication Date 

SjH'Ci/antl inurticv /Vv/ l»)f>5. I97n 
i'lHahiiiitv I'csr 

PubliNlior Time 

rcjthcfN Collesc Prew 44 inimitc\ 



23 



Totjl reading score is determined by averaging the standard scores of 
the subtests. The authors correctly poini out that, when deterniininc 
averages, it is poor practice to sum and divide raw scores since thev are nol 
based on an eqiial-intervjl scale. It should also be pointed out 'that the 
authors deemphasi/e grade equivalents which are rarely ^.curate for hi^h 
school students. 

Tlie American Psychological Association's 57j//f/jrt/s /:V//cj//V;/w/ 
and Psychological Tcsis and Manuals considers it essential that a test 
manual indicate the qualillcations required to administer and interpret a 
test properly. It is interesting to note that the Gates-MacGinitie Readiim 
Tests manual makes no attempt to do this. Moreover, the Teachcr^s 
Manual and the Technical Supplement suffer the same deficiencies as those 
ol many other reading tests: a discussion of the uses of the test and of the 
reading behaviors sampled is not provided. The Teacher s ManuaUu\i^c${^ 
that the teacher may best understand the tasks imposed and meaniiri: of 
the resulting scores by reading the test and carefully consideriim what ir. 
involvijd in getting correct answers. This suggestion is valid to the extent 
that classroom teachers have the expertise tinnake an item analvsis of the 
test. Indeed, it is desirable that the test user familiari/e himself with the 
test and its uses. The test publisher, however, has the responsibility of 
providing the user with informat!'.)!! necessary for makini! responsible 
decisions concerning the use and interpretation of the test. 

Norms 

The normative data presented by the publisher is inadequate for a 
number of reasons. First, according to the Technical Manual, the norming 
population was '\ ..carefully selected on the basis of si/.e, geoiiraphical 
locations, average educational level, and average family income/' These 
criteria, however, are not explicitly defined. It is interesting to note, 
though, that testing was carried out in "... schools, judged by school 
officials to be representative of the community as a whole." Second, 
demographic characteristics of the norming population are not adequately 
described. Third, norms for the hand-scored editions appear to be based on 
too small an "n" to be representative or reliable. Consequently, it is 
extremely difficult for the test user to interpret test results with the 
normative data provided. Users of the test, therefore, might insure the 
most meanlngfnl interpretation of t"st results by obtaining local norms. 

Reitabiiity 

Both split^half and altorna»c-form reliability coefficients are reported 
by subtests for machine«scured editions of the tests. In every instance, 
these data are acceptable. On the other hand, no reliability data are 
presented for hand.nvored editions. In light of this inadequacy, test users 
would be wise to compute reliability coefficients for local populations or 
have the tests machine-scored. 

Validity 

Validity evidence for the tests is limited. The tests Jiowever, do appear 
to have lace validity. Users of the test should be aware of the fact that the 
manual fails to provide a description of the curriculum content which the 



24 



tests purport to measure. More important, there is no evidence that the 
subtests were developed according to the content of reading programs. 
Before using these tests, consequently, the user siiould examine tlie objec- 
tives of his reading program and compare these to the content of the tests. 

Correlations between subtests and Lorge-Thorndikc Verbal IQ scores 
are reported. In general, these correlations reveal a high degree of relation- 
ship between Vocabulary* and Comprehension scores and verbal IQ scores. 
Speed and Accuracy test scores, on the other hand, are less related to 
verbal IQ scores. In short, the correlations between subtests scores and 
verbal IQ scores lend credence to the argument that there is a high degree 
of relationship between group measures of reading behavior and group 
measuics of verbal IQ. 

Evaluation of Subtests and Items 

According to the authors, "The Speed and Accuracy Test provides an 
objective measure of how rapidly students can read with understanding." 
Items for the Speed and Accuracy Test are 36 short paragraphs ending in a 
question or an incomplete sentence. Students indicate their responses by 
selecting one of four words presented. Two scores are reported: Speed, the 
number of items attempted, and Accuracy, the number of correct items. 
Time allowed for this subtest is four minutes. 

The stated purpose of the Vocabulary subtest is "to sample the stu- 
dents' reading vocabulary." This subtest consists of 50 items in which the 
student matches a test word with one of Hve words that follow. In both 
surveys, the item progression for this subtest appears to be from easy to 
difficult. A time limit of 15 minutes is allowed for this section. 

According to the authors, the stated purpose of the Comprehension 
Test is to measure, "students' abilities to read complete prose passages 
with understanding." This section contains 21 passages with two to four 
words per paragraph deleted for a total of 52 deletions. For each deletion, 
the student selects the answer which best conforms to the meaning of the 
whole passage from a word list of five alternatives. For both surveys, the 
average difficulty for the passages to be read appears to progress from the 
easiest to most difficult. Time allowed for this section is 25 minutes. 

In sum, the names assigned the three subtests are functional and 
accurately describe the actual tests. It should be noted, however, that the 
Speed and Accuracy items vary in what is required to answer an item. For 
instance, some items require inference as well as matching of a detlnition 
with the correct word. In addition, the Comprehension Test apparently 
taps only one type of comprehension: the ability to use context clues in 
conjunction with the overall meaning of the passage. 

Summary 

The Gates-MacGinitie Reading Tests, Survey E and Survey F, are a set 
of group reading tests based on recent normative data. The tests yield a 
measure of general reading achievement for students from grades 7 
through 12. In general, the tests are well constructed and provide useful 
information for evaluating growth, screening students for further diagnos- 
tic testing, and organizing pupils for instruction. The use of the tests for 
making classroom decisions, such as diagnosing specific reading skills, is 



25 



limited even though the authors suggest that teachers interpret test scores 
through item analysis. 

The tests have been normed for periodic nssessment, inid the subtests 
and total test scores are reliable. Still, the test user should e.\:imine the 
validity tor measuring the objectives of a speciHc rendinu prounim it 
should also be noted that limitations are found in the description of the 
norming population and in the failure to report reliabilities for hand- 
scored editions. Thus, the development of local norms would aid in the 
interpretation of test scores. 



• Iowa Silent Reading Tests 

Reviewed by Ronald Johnson 
Wisconsin State University ai River Falls 



Name of Tesi 

lowj Silcni Rojilinc l oMv 



Revision Dale 
1973 



Sub I CMS 

Vocabulary 
Comprchfiision 
Oircctcd Rcadiii}* 
Reading Kffidcncy 



Publicaiion Dale 
1929 



Publisher 
Miircouri Hr.KV 



At the time of this writing a thoroughly revised edition of the low:i Silent 
Reading Tests (ISRT) was being prepared. The publisher made available 
the detailed specifications used in developinu the tests and copies of the 
restricted (item-analysis) edition, I97K tests at each of the three levels It 
was expected that this edition would confomi closely to the final pub- 
lished form of the tests. Since this edition was for item-analvsis purposes 
only neither a test administrator's mnniial nor a technical' manual was 
available. This article, therefore, might be more appiopriatelv called a 
preview rather than a review of this test. 

Overview 

Three levels of the Iowa Silent Reading Tests are available: 1 evel I for 
grades 6 through 9, Level 11 for grades 9 through I 2 (average readers), and 
Level III for grades 1 1 through 14 (superior readers). Levels 1 and 11 each 
have four subtests: Vocahulan\ Comprehension. Directed Rcadhw and 
Reading hjficiency. Directed Reading is divided into A: LoJaiinij 
InformatKMr* and ^Tart B: Skimming and Scanning.** Level III has three 
subtests: Vocabulary. Comprehension. Reading I:J'Jkienc\\ 

Norms - Reliability - Validity 

No data on these important topics were available at the time of this 
review This reviewer would strongly urge that any prospective user of this 
test obtain a technical manual from the publisher before nurcliasini; the 
tests. ^ 



26, 



Evaluation of Subtests 



In tlie Vocahulary siihtest ;it aii three levels, the student is leqinred to 
select ;i synonyii) foi ;i stinuiliis word honi iuiioni; four distriiclors. The 
words included ni these sul)tests luive heon selected to lepiesent geneuil 
reading vociil)ulary. This procedure lepresenisa marked change from the 
earlier edition of the ISRT where the words were selected as "significant 
words in lour high school subjects '* 

The comprehension subtests, tor the most part, follow the coninion 
format of a series of multiple-choice c|uestionson selections which increase 
in length and complexity as the student piogresses through the test. I he 
content of the selections is vaiied: the great majority, however, seem to lie 
oriented toward the social studies. The Unal selection of the Coinprcltcn- 
sion subtests at each level varies somewhat fioin this format. In Levels I 
and II the student is directed to **read and study'* a lathei lengthy 
selection. He then turns the page and answers !(> questions without 
looking back at the selection. It does not appear that the different type of 
recall assumed to be required in this task will l)e scored separately. In 
Level III the student is directed lo read two shoit selections before 
answering the multiple-choice questions which contiast the views of the 
two writers and shift back and forth with questions on the sepaiate 
selections. It is not clear to this reviewer jusi what compiehension skills 
this selection is designed to measure: nevertheless, whatever it measures is 
not scored separately, and its contiibution will piobai^ly be mixed in with 
the rest of the items in this subtest. 

The Directed Reading subtest is a part of Levels I and II only. *'Part A: 
Locating Information" attempts to measure the student's reading-study 
skills by stressing the use of a dictionary and locating information in a 
vaiiety of sources. "Part B. Skimming and Scanning" consists of a factual 
article of the kind found m most encyclopedias Only the article, including 
charts and graphs, is printed m the test booklet. The multiple-choice 
questions are printed in the student's separate answer sheet. The student is 
directed to read each question and to '*glance over" theaiticlein ordei to 
answer the questions- He is specitlcally told not to try to read the entire 
article. In this writer's experience with the Directed Redding; subtest of the 
earlier edition, these directions cannot be stressed too strongly. Invaiiably, 
a number of students will follow the learning set established over the years 
by similar tests and by the pievious subtests in this test: The change in 
procedure does not register; they begin to read the entire article from 
beginning to end. and tliey an.swer no (fuestions before the time runs out. 
While a low score for these students does indicate that they do not follow 
directions, their score is not related to the leading skills of skimming and 
scanning that the sui)test was designed to measure. The directions in this 
revision are more pointed than those of earlier editions of the ISRT; this 
reviewer, however, suggests that the test administrator place special em- 
phasis on the changed format or that heconsidei administeiing this subtest 
first when a higher percentage of the students are listening to the dnec- 
tions. There are simpler, more dramatic demonstrations of the student's 
inability to follow directions than is offered by an invalid score on this 
subtest. 

The Reading h'fjteieney subtest is included in Levels I, II, and III. Like 



27 



the questions lor skimming and scanning in Levels I and II, the questions 
tor this subtest are printed on the student's answer sheet rather than in the 
test booklet. This subtest represents the only deviation from a four^ 
distractor, multiple-choice format in the entire battery. The deviation is 
slight. The student is presented with a connected passage; at certain breaks 
in the passage he is to mark which of three words fits the context (a 
three-distractor, multiple-choice modified cloze procedure). How often 
words are omitted varies from the fourteenth to the thirty-seventh word 
Some tests using doze procedure require the student to read beyond the 
-i^u .COT information to supply the correct response. 

I he IbKi avoids this problem by having every blank occur at the end of a 
sentence. Even though the cloze procedure technique for measuring read- 
ing comprehension was greatly modified in preparing this subtest the 
resulting format may well be too artificial to yield a useful estimate of 
either the student's rate of reading or of his comprehension. 

Summary 

Only item-analysis copies of the Iowa Silent Reading Tests, Restricted 
hdition, 1971, were examined. For this reason manuals were not available 
at the time this review was written. 

*u ^^^^ available covering the range from grades 6 

through 14. The two lower levels have subtests measuring Vocabulary 
Reading Comprehension, Directed Reading, and Reading Efficiency Level 
ill measures Vocabulary, Reading Comprehension, and Reading Effh 
ciency. Each ot the tests seems to be carefully prepared and easy to use 
Three important questions are not answered because the data were not 
available when this review was written: 1) !s the norm population ade- 
quate for the user s student population? 2) Are the scores yielded by this 
test valid? 3) Are they reliable? ^ ^ 

It is most unfortunate that with the long-needed revision of the ISRT 
It has lost its uniqueness. The older test attempted to measure skills 
directly related to achievement in school subject areas. The revision seems 
to have become almost indistinguishable from all of the other reading tests 
available, both in terms of what it measures and in the items it uses. 



• The Metropolitan Achievement Tests: 
Reading, Advanced Level 

Reviewed by Joe Peterson 
The University of Georgia 



Name of Test 

Metropolitan Achievement 
Tests: Readmg, Advanced Level 

Revision Date 
1970 



Subtests 

Word Knowledge 
Reading 

Authors 

W. W. Durost 

H. H. Bixler 

S. W. Wrightstone 
G. A. Prescott 

I. H. Balow 



Publication Date 
1959 

Publisher 
Harcourt Brace 
Jovanovich 



28 



Overview 

The Metropolitan Achievement tests, of which the Metropolitan Reading 
tests are a part, have been published in a fourth edition, effective with the 
1970 copyright. Virtually all new material has been created for these tests, 
and the Reading subtest at the advanced level has at least one selection 
relevant to the Black culture. As was the case previously, the Metropolitan 
Reading tests are available as a separate part of the total battery and 
contain two subtests: Word Knowledge and Reading. Three scores are 
generally computed from these two subtests and are confusingly labeled 
Word Knowledge, Reading, and Total Reading. As m'ight be expected, the 
Total Reading score includes the number of correct responses from both 
subtests. 

One of the strong points of the Metropolitan Reading tests has been the 
clarity of the Teachers' Directions and the Teachers' Handbook. These 
continue to be well written and will provide good information to the users 
of the tests. Nothing is taken for granted in these manuals: indeed, the 
discussion begins in the Teachers' Handbook with the candid question 
"Why test?** and proceeds from that point to detailed directions for 
administering the tests and for interpreting the results once the testing has 
been completed. It is commendable that the publishers put their tests into 
a proper perspective in that the tests should be thought of aS only one of 
many sources of information to be considered in trying to understand 
pupils. 

Norms 

Contrary to usual practices, standardization procedures have been car- 
ried out on all three forms of the test rather than standardizing one form 
and equating the others to tt. Since testing in the schools is almost evenly 
divided between fall and springs the test constructors decided to stand- 
ardize the tests at two different times during the school year rather than 
once, as has been the practice in the past. Accordingly, Forms G and H 
were standardized in October and Form F in April, each form being 
administered to a balanced sample of about 7,000 students in grades 7 and 
8 and, for Forms G and H, about 4,000 students in Grade 9. Studies of 
equivalency were conducted during the spring standardization program. 

Four types of derived scores are provided for the tests: standard scores, 
percentile ranks, stanines, 2nd grade equivalents. According to the Hand- 
book, the basic use of standard scores is for measuring growth within an 
area (Did Johnny exhibit any growth m reading?) whereas stanines and 
percentile ranks provide a means for comparing subtest scores in different 
areas (Is Johnny's reading achievement grossly different from his word 
knowledge?). The familiar grade equivalent score is downgraded for its 
inherent inaccuracies and ease in misinterpretation when dealing with 
individual scores. Emphasis has been placed on the use of the more 
sensible stanines in interpreting comparisons between subtests in order to 
help avoid the overly precise connotation of other derived scores. This 
emphasis should be heeded. 

According to information provided by the publisher, all levels of the 
1958 edition and all levels of the 1970 edition were given to comparable 
groups of unspecified size. A table was compiled from these five batteries 



ERIC 



29 



which shows comparable grade equivalent scores on the two editions of 
he tests so that the results of the old and new editions can be compa ed 
Ind,cat.ons are that a grade equivalent of 9.4 on the 1958 H/J^S 
mf« ^ r ^l"'^/"'^"' 0 «" "'e 1970 edition whereas a 9.4 on the 
1958 Reading subtest would come out the same (i.e., 9.4) on the 1970 
edition. Most of the rest of the Word Knowledge scores below 8.0 and the 
Reading subtest scores seem to fluctuate from almost exact equivalence to 

o-^s Z r ^"T ^^''''y ^""''^ earned a 5.9 on the 

\958 Reading subtest would probably have earned a 6.4 on the 1970 
Reading subtest. Although the differences between the two tests do not all 
vary in the same direction, the comparison suggests that the average iunior 
ugh reader of 1970 is slightly better than the average junior hIg Se o 
welve years ago. Such comparisons, however, should be mterprefed c:.u 

rlXin"'' ""^^ 1'"= ^'"""'P"°" O'^' norming'samp e"of 
the two editions are drawn from the parent population in exactly the same 
way and that the changes in the content of the material the pupL e^d 'o 
the test, when mixed with changes in our culture, are immaterial variables. 

Reliability 

Split-half reliabilities have been computed for both fall and sorine 

■nmSH /. ^''r -'^^ '"^ 'e^'^' mdividually and 

lul.Z r "'"'=f°[.%="d,cating high internal consistency. No daia on 
alternate form reliability were available at the time of this review. 

Validity 

Validity of the tests is defined in terms of content validity A descrio- 
tion of the process whereby the content of the tests was decided upon as 
been published. Users of the test, however, will need to survey the ?onten 
of the tes^s to determine the fit of the tests to their curriculum A 
brochure. Content Outlines, is provided for this purpose. ^ 

Evaluation of Subtests and Stems 

in the final analysis, the worth of the Metropolitan Reading Tests as 
Sn" ^^il.ty depends upon the user's dennition of read ng 

Given a broad definition of reading, the tests have less than adequate 
worth, given a more restrictive definition, however, the tests do an ade- 

SSafe'th^inaner"'"' "^''"^ '''' ^ '"'^ ^"-'^ 

Ji^e Total Reading test is made of two parts, a Word Knowledge subtest 

m sol.1fn?"V"'"'r- ''r' """"'"''^^ ^"b'"' consistsof lo word 
humanit e Th " ?^ b) social studies, c) 

Xh ' ^ ""^ mathematics, and e) antonyms - for each o 

which the pupils have to select one of four meanings. To the casua 

S!'arr::dinra;Hity"°"'' '"---8 generaf knowledge Se) 

The Reading subtest consists of forty-five questions based on seven 
selections varying m length fiom two to five paragraphs. The tJpes o 
questions asked are classified into four types and are repre ented in tl2 



30 



tbllowing proportions in Form F- words in context, I i/45; literal ques- 
tions, 7/45; inference questions, 23/45;and main thought questions, 4/45, 
It should be noted that this portion of the test does a good job of 
measuring the higher cognitive process reading abilities and does it witli 
materials covering the sciences, social sciences, and humanities, all of 
which are expository in nature. 

Complete as these Reading tests seem, however, they do not measure 
rate nor include any materials to check, reading ability in the literary 
materials usually found in the English -language arts curriculum, nor are 
any of the traditional work -study skills evaluated. If, however, the whole 
Metropolitan Achievement Test battery has been given, the enterprising 
user can find information on some of tbese work -study skills by extracting 
information from the first 34 items in the Language test and from 12 
scattered items m the Mathematics and Science tests. The specialized skill 
of reading maps and charts can likewise be checked since it comprises the 
last 24 items in the Social Studies lest. The lack of convenient norms on 
this combination of items and the size of the task of extracting informa- 
tion in such a fashion suggest that these portions of the reading act will he 
examined by few users of the battery. 

Summary 

Although some of the components necessary for complete evaluation of 
the Metropolitan Reading tests have not been published as of this review, 
it seems apparent that the tests are technically well constructed. For the 
user wanting to evaluate a limited portion of the act of reading or to 
perform an initial screening of pupils for special services, the Reading 
subtest of the Total Reading Test seems worthy of consideration. The user 
who wanls t.) examine reading in a broader scope would do well to 
consider giving, the complete Metropolitan Achievement Test Batiery and 
arranging for computer analysis of the parts mentioned. 

• Nelson-Denny Reading Test 

Reviewed by Roger Farr 
Indiana University 

Name of Test 

Nelson-Denny Heading 'lesi 



Revision Date 
1960 



Time 
40 

Overview 

The Nelson-Denny Reading Test is designed for use in grades 9 through 16 
and IS available in two separate forms. The authors state that the test 



Subtests Publication Date 

Vocabulary 1929 
Comprehension 
Rate 

Authors Publisher 
M. J. NeNon Iloughlon- 
i:. C. Denny SUmm 
i. I. Brown 



31 



serves predictive screening, and broadly diagnostic purposes. Three subtest 
scores are available: Rate, Vocabulary, and Comprehension. The Vocabu- 
lary test consists of one hundred multiple-choice items and is a timed (10 
minutes) test; the Comprehension test, which is also timed (^0 minutes) 
consists of 36 multiple-choice items based on a series of reading selections; 
the Rate score is based on the number of words of the first com rehension 
selection which an examinee reads during the first minu.e of the coniDre- 
hension test. 

Five kinds of answer sheets are available; IBM sheets ( ! 230s and 805s) 
for machine or hand scoring, MRC answer cards for machine scoring a 
self-marking answer sheet with a carbon marking system, and Digitek The 
directions are clear and complete; however, examinees should be watched 
carefully when they move from the Vocabtilary to the Comprehension 
tests as the t^est booklets must be turned over, the Comprehension subtest 
being printed on the back of the Vocabulary subtest. 

The total test score is arrived at by allowing two points for each 
comprehension question that is answered correctly and one point for each 
vocabulary question answered correctly. The rationale for this procedure is 
that the total score will thus provide a better balance between the 
vocabulary and comprehension factors. Twice as much time, however is 
allotted to the Comprehension test as to the Vocabulary' test. The lack" of 
any empirical basis for the scoring procedure is regrettable. 

Norms 

A stratified random sample of 8,472.478 subjects yielding ^0 866 
tested subjects was used to establish the test norms for grades 9-P The 
stratification was based on geographical region (eight sections of the 
coun ry) and on community size (four population ranges). The norming 
population for the college grades (13-16) was randomly selected from five 
different types of higher education institutions. For both the high school 
and college samples the norming population is of satisfactory size and has 
been adequately selected and described. It wouid have been helpful 
however if additional descriptive information such as socioeconomic levels 
and intelligence test scores had also been supplied for the norming groups. 

I he test authors do not make any recommendations or suggestions 
regarding the development of local norms. This reviewer has often found 
that the most meaningful test interpretations can be made when a test 
score IS compared to a population with which the test usei is quite 
familiar Test consumers should, therefore, seriously consider the develop- 
ment of local norms for the specific u.<:es and the specific situations in 
which they want to use this test. 

Reliability 

The reliability data are quite insufficient. The sample sizes used are 
extremely small and inadequately described. Test consumers, therefore, 
should not rely on the reliability coefficients in the manual as a guide to 
interpreting test scores. The coefficients which are reported are of suffi- 
cient magnitude but there is no way of knowing ,f the populations studied 
are comparable to the population an examiner is testing. In addition, the 
manual reports a reliability coefficient of .93 for the one minute rate test 



ERIC 



32 



In a number of studies, however, this reviewer has been unable to establish 
a rehability coefficient for the rate measure even approaching .80. 

The procedures used to develop the Standard Error of Measurement 
Tabic in the reliability section are not described. It could not be deter- 
mined what population was used to compute the standard errors for the 
various subtests reported in the table. Under this condition little reliance 
should be placed in the data in this table. 

Validity 

The validity evidence for the three stated test purposes {prediction, 
screening, and diagnostic) is generally inadequate. There is only one small 
predictive study reported in the manual. This study is not adequately 
described, nor can much use be made of it for making predictive decisions 
about students. 

The diagnostic validity evidence for the subtests is completely lacking. 
Not only do tlie test authors fail to report any evidence regarding the 
amount of overlap between subtests, the authors also seem to be unfamil- 
iar with this usual state of affairs. In discussing uses of the test, the authors 
state: **More often than not, however, a student's test profile will show 
one area well above or below the others.** Anyone who has spent much 
time studying the research on reading test validity will easily recognize the 
invalidity of this statement. 

The development of the test does, however, seem to provide both face 
and content validity for using the test as a general screening measure for 
assessing students* reading abilities. In addition, the carefully developed 
norm sample and the percentile and grade norm tables provide a useful 
means for interpreting the scores. 

Evaluation of Subtests and items 

The attempt to develop a reading test which spans eight grade levels is 
probably a mistake. A test which is difficult enough for colkje seniors will 
certainly have little bottom in it to measure the reading ability of the 
average ninth grade student. Tliis is a major weakness of all the items and 
subtests on the lest. 

In addition, both the Vocabulary* and Comprehension subtests are 
timed so strictly that very few examinees can complete the test. Difficulty 
is built into the test, therefore, by the use of time restrictions rather than 
by measuring increased reading ability. These two subtests should be 
properly titled **Speed of Reading Vocabulary" and **Speed of Reading 
Comprehension.*' 

Generally, the content of the test seems to be better suited to the 
reading interests and abilities of college students than it does to those of 
high school students. The content of the reading comprehension selections 
also seems to favor those students with literary interests. There is little 
emphasis on scientific-type reading material. 

Summary 

The Nelson-Denny reading test should be used only for broad screening 
purposes when an examiner wishes to determine students' speeds of 
reading. The test is heavily timed and is more suited to college students 



ERLC 



33 



than to high school students. I here is ahiiost no evidence to suppoit use of 
the test as a diagnostic or predictive measure. 

The rehabihty and vahdity evidence is completely inadequate forniost 
purposes. While the norniing population is adequate, most test consumers 
would be better ott to develop locyl norms for their own situations 
Despite the tact that the Nelson-Dennv has been a favorite ofj-ichers foi 
many years, there are several other high school and college readinu tests on 
the market which have sounder Uieoretical bases and which will slmvo most 
testing needs more adequately than the Nelson-Denny. 

• The Nelson Reading Test 

Reviewed by Lawrence M. Kasdon 
Ferk;*urGraduate School, New York Citv 



Nam » of ,f 
NcK'^ ,1 Kejdinj; l est 

Revision Dale 
1962 

Time 
40 



Suhte.sts 

I 'ocabulary 

Coniprchaisson 

Author 

M. J. Neivon 



Iloti^tiion- 
Milllin 



Overview 

The Nelson Reading Test, Revised Edition, is a new edition published in 
1962 and developed to replace the Nelson Silent Reading Test The lest 
has two torms, A and B. It is designed to measure vocahulaiy and 
comprehension tor grades 3-0. The test yields three scores vocabulaiv 
paragraph compreheiision, and total reading score. Kxainincr s Manual 
Sel -Marking Answer Sheets, IBM Answer Sheets, Digitek Answer Sheets! 
and Olass Record Sheets are available, hor a fee, I loughion-Mimin will 
score answer sheets and provide building and School System Percentile 
Norms. 

In addition to the scoring methods indicated, the IBM scoiing kevs can 
be used for hand scoring answers recorded on IIJM answer sheets, tabkvs 
are provided in the manual to convert subtest and total test raw scores lo 
grade and percentile norms. These tables were .standardized at midyear so 
that percentile scores tor the beginning or end of the year are arrived at bv 
interpolation. For exaniple, if a sixth grade child's total reading law score 
IS 61 on a test administered in September, this score would place him at 
the 21st percentile according to (he sixth grade norms and at the 47th 
percentile according to the fifth grade norms. The test author conchided 
Iherelore, a reasonable expectation for this child would be a rank at the 
34th percentile - haltway between the two percentile ranks deiermnW 
previously.* 

The author is frank in mentioning the limitations of grade equivalent 
scores. Again, since the test was standardized in January, all other scores 
are obtained by interpolation. Although it is expensive to standardize a 
test, the author and the publishers ought to standardize a test at the 



34 



beginning, middle, and end of the school year in order to offer the 
consumer viable norms. 

The total score represents the tol:il number of items correct. Separ:ite 
sets of percentile and grade norms have been calculated for these raw 
scores. 

The Excnnmer*s Mcnmal is reasonably satLsfactory in format and con- 
tent. All directions to be read to the pupils are printed in boldface type, 
and directions for the examiner are in regular lype. Both types of direc« 
tions are simple and clear-cut. The manual contains tables of percentile 
rank for each grade level as well as a grade equivalent norm table. Raw 
scores can be converted to these two statistics in a .straightforward mannei. 
The section "Some Uses of the Test'* is probably the weakest part of the 
manual. After a brief discussion of how to use the test daia, the author, to 
his credit, confes.ses that his suggestions for using the le.st results are 
superficial. His major problem is that he tries to use a survey test as if it 
were a diagnostic test. 

Nornfis 

The test was normed on approximately 18,000 students in 5.^ commu- 
nities in 37 states. The author .states that the samples were .selected to 
represent four regions in the United States and that these areas were 
further stratified by community .size. Although the author tried to obtain 
.>0 percent of his samples from the Southern states, he, in fact, obtained 
40 percent from this area of ihe country, leaving other areas under- 
represented in the standardization. A list of communities was randomly 
selected within the si/.e and regional strata. A respon.sible person in the 
school system of the community was asked to select classrooms at ran- 
dom. The appropria^e level of the Henmon-Nelson Test of Mental Ability 
w;is administered to the standardization group. The mean IQs for the 
various levels ranged Um\ 103. 1') to 109.73 with the median at 106.46. In 
view of these findings, the author of the test reviewed 1 1 studies in which 
the Henmon-NeLson test was used. He found that the Henmon-Nelson 
generally yielded slightly lower means than other individual and group 
intelligence test.s. He concluded that the Nelson Reading Test standardiza- 
tion sample was above average. By statistical procedures he brought the 
average IQ of the standardization sample to 100 for each grade level. 

Within the limitations mentioned the norms can be cautiously accepted 
as being representative of national performance. The author does suggest 
the development of local norms, and this reviewer feels that local norms 
would make for a more precise and meaningful interpretation of the 
reading scores. 

To obtain a minimum grade score of 2.00 on either Vocabulary or 
Comprehcmiofu the student must have a raw score of seven items correct, 
an amount which seems empirically appealing. Again, this reviewer would 
like to point out that this test may not discriminate well among students 
in the third and fourth grades of less-than-average ability in reading. In as 
much as the test was standardized on samples from grades 3-9. grade scores 
below 3.5 and above 9.5 are interpolations and must be regarded with 
caution. The percentile ranks would not adequately reflect the perform- 
ance of extreme groups since the tables do not reflect either a floor or 
ceiling effect. 



ERIC 



35 



Reliability 

Reliability indexes were computed by the alternate-form procedure so 
that the consumer can judge iiow accurately a scor.< on one form of the 
test wil be reproduced il he measures students on another lorni of the 
test. Reliabilities are reported for Vocabulary, Comprehension, and Total 
bcoK dt each grade level and for each of the two forms of the test The 
author does not offer any information about the samples on which the 
reliabilities *erc computed except that they varied in size from 8 1 to 105 
students More information is necessary before one is able to interpret 
these reliabilities for his class. Also, the author does not state whether 
both forms of the test were administered on the same day. on altemate 
days, or a year apart. The alternate-fomi reliabilities for Vocabularv and 
Comprehension range from .8 1 -.89 and for the Total Score from 88.<)? 
Alternate-lorm reliability is a rather conservative estmiate of test reliabil- 
ity, and the figures reported are satisfactory. 

Another type of information on the test's reliability is available in 
terms ot the standard error of measurement for both raw scores and crade 
equivalents for each grade on both forms of the test. If a student were 
tested many times on a series of equivalent tests, disregarding such ele- 
ments as practice and fatigue, his score would varv; the standard error of 
measurement is the calculated estimate of this varia'tion. 

Validity 

The evidence of the vnlidity of the test is rather limited. Except for the 
addition ol two paragraphs to each form, all of the items by and lar-.e were 
selected from the three forms of the earlier edition ■)f the test. Thu°s most 
of the content is from the I Ws. Therefore, on whatever basis the content 
was selected m the I9.^0s and to the extent that the readin« curriculum 
has changed since, the curricular validity of the test is weakened Teachers 
are advised to examine this test to be sure that it adequately measures the 
objectives ol their reading programs. 

A case for the concurrent validity of the Nelson is made by citiim 
correhitions between it and the Iowa Pest of Basic Skills. The Vocabulary 
and Comprehension subtests correlate from .62-.88. These correlations 
reflect moderate to high concurrent validity. The Nelson Readinu Test 
together with the Nelson-Denny Reading Test are intended to provide 
continuous measurement of reading ability from Grade .1 through the 
adult level. The correlation between the two tests for 247 ninth uraders 
was 82 for Vocabulary, .70 for paragraph Comprehension, and .84 for 
otal. One can conclude thai both tests are measurinc the same skills to a 
lair degree. 

Nearly 92 percent of the students participating in the standardization 
were administered the appropriate level of the Henmon-Nelson Tests of 
Mental Abi hty as wel as the Nelson Reading Test. One cannot help but 
wonder if the loss of 8 percent of the cases had a significant effect on the 
makeup of the original standardization group. Only correlations of total 
scores for both tests are given in the manual. I-or the establishment of 
concurrent validity the correlations are sufficiently large. Unfortimatelv 
no information is given on the correlation between the Verbal subtest of' 
the Henmon-Nelson and the Nelson Reading Test subtests as one would 



ERIC 



36 



expCL't sucii correlation to be quite liigli. Tlie correlations for tliese total 
scores do not reflect the usual pattern of growing larger as comparisons are 
made with higher grade levels. This point should warrant investigation. 

Evaluation of Subtests 

Both forms of the test contain 100 vocabulary items and 75 compre- 
hension questions. The working time for the Vocabulary section is 10 
minutes and for Comprehension, 20 minutci. Considering the number of 
items and the tune limits, this is a combination of a speed test and a power 
test. 

The Vocabulary* test contains 100 words of increasing difficulty. The 
five multiple-choice answers are sometimes synonymous with the word 
tested; in other cases, they are descriptive of function or attribute, with an 
occasional antonym added for even greater variety. This variation requires 
considerable mental agility on the part of the pupil. On occasion, the word 
being tested is easier than the answer: "A quart is a measure of 1) 
enthasiasm 2) opportunity 3) capacity 4) temperature 5) geometry." This 
part of the test may be too difficult for third graders of less-than*average 
ability and may yield little information about them. 

Paragraph Comprehension consists of 26 paragraphs of increasing dif- 
ficulty. Except for a couple of the last paragraphs, the style and content 
smack of the content of reading texts of the pre-Sputnik era. The 
information contained in Test Paragraph X, Form B, on interplanetary 
travel is somewhat dated. Each paragraph is followed by three multiple* 
choice questions. Each of the three questions is designed to measure a 
different type of comprehension - general meaning, details, and predicting 
outcomes. Having a predict ing-outcome question for each paragraph means 
that the paragraphs had to be written for that purpose. For reasons known 
best to himself, the author has mixed the order of these three types of 
questions. This practice disregards the importance of setting purpose for 
reading and confuses the pupil who has established a set from working the 
sample paragraph. See Practice Exercise, question 1, for example. In other 
cases, some of the questions do not fit the categories the author has 
established. For example, see question 18, Form A. 

Summary 

The Nelson Reading Test provides a general measure of reading achieve- 
ment for grades 3.5-9.5. F-roni a statistical point of view the tests are well 
constructed. Unfortunately, little information is given about the samples 
on which reliability and validity data are based; thus, the user cannot 
know whether these data would apply to his population. The grade 
equivalent norms are somewhat restricted at the lower and upper ends of 
the test. If a teacher is satisfied using percentile ranks, those in the manual 
are quite adequate for grades 3.5-9.5. The words in the vocabulary section 
and the paragraphs used for comprehension appear to be somewhat dated. 
In addition, a few paragraphs contain inaccurate information, e.g., ''tigers 
do not inhabit forests" (Test Paragraph 5, Form A). 

The tests have been notmed only for the midyear (January) so that 
interpolutions, based on the rather untenable assumption that growth in 
reading throughout the year is uniform, need to be made to obtain 
percentiles for the beginning and end of the year. A teacher should 



ERLC 



37 



carefully compare her objectives in reading with those of the test when 
tlunkmg about its validity. The interlorm reliability of the test ; qui,; 
adequate. Perhaps the greatest advantages of the test are in ,ts r nJe S" 

SLToS ^ :;;i;or - ^ ^-p-^ 

• Sequential Tests of Educational Progress 
Series II: Reading 

Reviewed by Thomas Estes, 
University of Virginia 

Name of Test <;iihtxt« „ . 

sTt-D. B^i- J>ufitests Publ cat on Dafe 

STEP: Reading fi„„^ ^^^^ 

Publisher Time 
Cooperative Tests Educational 45 minute-, 

and Services jesting Servu-e """"" 

Overview 

The new STEP Reading Test is part of STEP Series II. a comprehensive 
battery of tests designed to measure ability and adiievemenU, eaS 
fnstS'io" improt;:em':S 

thrlnlhfu '"J'"" equivalent forms. A and B. and four levels. I 

through 4. It spans a grade range of possible use from grade 4 to grade 14 

KlH°H'"'r'i'' ^'"1"' ^"-^"^ '■■•"'''^d si"" the only scoi 

yielded ,s of Reading Comprehension. Vocabulary and speed subscores are 
not computed Th.- 65 items are split into two parts Part I a "o- cm 
\nTr!!,ll '"'^ "--""'Pletion test. and Part li. a 35.item 

30-mmute paragraph comprehension test. These combine to yield one 

As a part of the STEP Battery or as a broad screening device for general 
reading skill, the test is useful. Those, however, who have in , nd Tr 

arioToKXe" ''''' ' ^"'^"^ -^""^ 

provided by ETS requires use of NCS answer sheets ^ 

Directions for administration and scoring of the test apoear to be 

t'tn TJ' "'r"«'':"' ^'^'^'^''""^ not%n,phasi^e str-nJy enough 
the importance of replicating as nearly as possibk the exact cond S 
under which standardization took place. Raw scores transfo m eas Iv 

The STEP battery of tests shares a common weakness with its similar 
compe itors: any specific test tends to be lost in the crowS^. o o S 

he needs of a person using only one of the tests. Attempts to deal with 
problem in this case seem to have resulted in a thick finely J imcd 
book of norms and a scanty, rather non-specific user's inaiuinl It ™ 
strange that a test enjoying such wide acceptance and o ch ovS 



38 



ERIC 



quality ;is this one provides so little user's ;]ssist;]nce in its ni;inu;ils. 
Hopefully, later editions of the manuals, along with a more comprehensive 
handbook, will remedy this problem. 

Norms 

The normative sample used for tins test is not clearly defined by the 
preliminary handbook. The claim is for a representative sample of students 
at all educational levels, but procedures by which this representation was 
insured are not mentioned. 

Whatever its exact nature, the sample on whom the reading was stand- 
ardized was adequate in size for levels 2, 3, and 4 (grades 10-12, 7-9, and 
4-6). The total number of pupils tested was 26,678. Unfortunately, for 
level I, the college level test, a very small sample was drawn, numbering a 
scant 921. 

While national norms have advantages, most meaningful interpretation 
of test results is often obtained by use of local norms. Cooperative Tests 
and Services provides a scoring service for users of the STEP which 
includes a computation of local norms as one reference group, in addition 
to the nationally drawn reference group. Users of the test should consider 
taking advantage of this valuable service. 

Reliability 

A reliability coefficient and a standard error of measurement are 
provided for every grade level in which the test should be admmistered. 
For Form A, 1,000 pupils were used to generate this data at each grade 
level. Perhaps as a reflection of this amount, the reliabilities for Form A 
are sufficient in magnitude and stability across grades, ranging from .88 to 
92. For Form B, this is not the case: a much smaller population was used, 
and the reliabilities range from .84 to .95. 

There is no mention of how the pupils were chosen for the reliability 
study. Worse still, no mention is made of the method used to derive the 
reliabilities. ThiS is an important consideration for timed-reading tests: 
until more information may be made available, caution should be exer- 
cised regarding faith in the reported reliabilities. 

Validity 

The present manual of this test never directly addresses the question of 
validity. This omission is unfortunate in light of the claim that the test is 
designed principally as an aid in improvement of instruction. No evidence 
is offered to suggest that the test predicts reaction to improved instruc- 
tion. There is no evidence, furthermore, that the results are in any way 
related to other measures, either in the STEP battery or apart from it, or 
that the manner in which reading is defined by the test is justifiable. Such 
criterion and construct-validity information is forthcoming in a promised 
technical manual, unavailable at the time of this review. Even so, the test is 
at best prematurely available for use ; at worst, it is still in its experimental 
infancy, despite the tenure of the STEP battery. 

The authors do, on the other hand, provide evidence of content 
validity. A separate table of specifications is given for each level of Form 
A. These allow the user to examine tlie kinds of comprehension the test 




39 



claims to measure. More precisely, he can determine what each item in the 
comprehension section claims to measure. For example, m Form 3A items 
3, 12, and 26 of Part B intend to measure ^^straight^forward coniprehen- 
sion of science material"; item 23 and 27, on the other hand assess 
evaluation of logic" in ^^narrative" material. Used with the discretion the 
manual suggests, this information could hp valuable. - 

In sum, this test's strongest suit is content validity. It appears to tap a 
range of reading abilities, broadly classed as vocabulary and comprehen- 
sion. In the comprehension section readers deal with a variety of material 
types in a variety of ways. The test calls for at least six kinds of 
comprehension: "straightforward" comprehension, drawing inferences 
understanding main ideas and supporting details, seeing applications evalu- 
ating logic and sensing style and tone. It is sadly unfortunate that the 
validity of such an instrument has to be taken at fate value only - 
empirical evidence would inestimably increase its worHi. 

Evaluation of Items 

J^^, provision of four separate levels of this test is appropriate since the 
ditticulty of the items is likely to be more inkeepmgwith the abilities of 
the majority of pupils taking the test. In addition, the paragraphs on the 
test cover a range of possible interest value. The reading passages are 
appealing in both content and length, and questions are asked in a sensible 
noninsulting fashion. 

In format the test is also pleasing. Type size is adequately adjusted for 
different levels, and directions to the examinee are clearly stated. The 
mechanics of taking the test should interfere minimally with results. 

Summary 

The STEP Reading Test is probably most effectively used as a part of 
the STEP battery of tests. In this setting, it can reveal a student's relative 
standing m reading as compared to other areas of achievement. Separated 
from Its companion tests, however, the test loses its main strength. 

A single reading score is provided, though a screening device in reading 
should probably include some estimate of speed and vocabulary More- 
over, the reliability and validity information for the test is limited No 
reading test approaches perfection, nor will one until test consumers raise 
their voices higher in demand of better quality. Hven so, much more 
confidence in this test would be inspired by a little more supportmg data. 
Later editions of materials to acjompany the tests may well provide the 
needful inspiration. 



• SRA Achievement Series (Multilevel Edition) 

Reviewed by Nancy Roser 
University of Texas af Austin 

Name of Test Subtests Publication Date 

SRA Achievement Series Comprehension \ 954 

(Multilevel Edition) Vocabulary 



ERLC 



40 



Revision Date 
1963 



Authors 
L. P. Thorpe 
D. W. Lefcvcr 
R. A. Nashlund 



Publisher 
Science 
Research 
Associates 



Time 
77 

Overview 

The 1%3 revision of the SRA Achievement Series incorporates a reading 
battery which yields Comprehension and Vocabulary scores for grades 7 
through 9, as well as a supplementary Work-Study Skills test. The pub- 
lishers suggest that, while the revision is designed to provide u complete 
battery of scores in seven areas, any single subtest can be purchased and 
jdministered separately. The revised forms (C and D) reflect changes in 
educational content and sequence that have taken place since the earlier 
forms (A and B) were published in 1954-1957. 

A unique feature of the total test, and consequently of the reading and 
study skills batteries, is that it is multilevel in nature; I.e., while the test is 
packaged as a whole for grades 4 through 9, the student's entry and 
stopping points vary with his grade placement and the time of year in 
which the test is administered. The content of the levels overlaps to 
provide continuity and to allow testing for a broad range of achievement 
within a group of students. Entry points are color coded with the students' 
answer sheets, green being the representative color for grade 7 and red for 
grade 9. The test administrator may elect either green or red during the 
student's eighth grade year, thus opting for a lower or higher level entry 
point depending upon prior testing and performance data collected. No 
specific criterion for making the appropriate entry point decision is pro- 
vided; but suggestions are made as to grade levels at which the tests are 
most often administered, the grades and time of year for winch norms are 
available, and the possible range of grade equivalent scores. 

It would have been helpful if the test authors had provided more 
descriptive information pertaining to methods of item validation and 
placement. The consumer must make some assumptions that the test items 
are suitably scaled in difficulty so that different entry points are meaning- 



Student responses to either of the two forms are recorded on Docutran 
scoring sheets, which can be either hand-scored or returned to SRA for 
machine scoring. If SRA scores the response sheets, information is pro- 
vided as to I) list report of scores, 2) ranked list report of scores, 3) report 
of average scores, 4) special report of average scores, 5) local percentile 
norms and frequency distributions, 6) individual labels for cumulative 
records, 7) pupil progress and profile charts, 8) item analysis report, and 9) 
individual item reports. 

In other words, SRA will provide such information as rank order, grade 
placement equivalents, stanines, and percentiles based upon raw score to 
allow group and individual comparisons through use of local norms, as well 
as information as to how the group being tested compares with a repre- 
sentative national sample. It seems valuable to note that the response sheet 
can be coded with other pertinent data about a particular student being 
tested, e.g., his sex, intelligence quotient, and/or some sociometric indica- 



ful. 



41 



tor so that group comparisons can be made during the inhouse scormg. In 
addition, teachers are supphed with individual pupil profiles indicative of 
each pupil's performance on each test item. The publishers make the latter 
information readily available to facilitate diagnostic planning and instruc- 
tion. 

Accessories for the test include: 1) an examiner's manual, 2) a general 
guide for planning and organizing the testing program, 3) an mtcrpretive 
manual for groundwork in terminology and application, and 4) a technical 
manual for test technicians. The Examiner's Manual is thorough and 
complete. Parts which are to be read to the students are printed with a 
contrasting ink color and inset. An interpretive manual provides a basic 
and detailed guide to utilization of the test results as well as simple, yet 
cogent, definitions of terms. The test administrator is led to recognize the 
value of local norms and to realize that the benefits of a standardized 
instrument are contingent upon appropriate application and interpreta- 
tion. The decision to publish separately the Interpretive Guide from the 
shorter accompanying booklet Organizing Your Testing Program appears 
to be an unfortunate one. The user may find himself shuffiing booklets if 
he fails to keep separate the information contained in each. The technical 
manual has strengths in its clarity and thoroughness in reporting. 

Time allotments are specific and generous. The total time for the 
reading test is 77 minutes, including directions. Time for the Work-Study 
Skills test IS 76 minutes, including directions. Because a power test (rather 
than a speeded test) was the authors' intent, approximately 90 percent of 
the students are expected to finish within the required time. 

Norms 

The standardization sample for the total test consisted of 71,199 
subjects in grades I through 9. Obfjining a proportionate geographical 
representation, while giving attention to urban versus rural residence, was 
the one criterion m selection of the sample for each grade level. The 
manual, however, provides no information specific to the mant.er in which 
the sample was drawn. What the user can determine from tne technical 
manual is that in order to arrive at proportional representation^ from each 
geographical category, an undetermined number of students were ran- 
domly eliminated from overrepresented areas and randomly duplicated in 
underrepresented areas. In all, over 20 percent of the standardization 
sample was excluded in determining the norms. 

Reliability 

Reliability coefficients are provided for all subtests of both forms 
Coefficients appear sufficiently high across all batteries, with composite 
reliabilities on the multilevel edition equal to or greater than .97. Only the 
Kuder-Richardson Formula 20 was used to compute reliability. While this 
is one suitable attempt to estimate one source of variance, other reliability 
measures could have been employed to examine other source > of variance. 

Validity 

The authors suggest that the test be examined by each administrator for 
content validity, i.e., each user should compare the content and skills of 



42 



his curricular intent with the test itself. After building a case for the value 
of overlapping content, the authors defend the face validity of the test, 
adhering to the belief that such measures can provide for continuity of 
evaluation in a seamless curriculum. 

in order to derive some validity estimates, an attempt was made to 
determine the number of independent dimensions measured by the series. 
The subtests were analyzed by use of tlie principal-components method of 
factor analysis. The factor loadings resulting from the analysis included: I) 
thoughtful reading, 2) computational (quantitative) ability, and 3) lan- 
guage ability. The data indicated consistency across level and form in the 
Achievement series as well as mdicatmg that different broad abilities were 
being measured by the Language Arts, Arithmetic, and Reading bdUcries, 

Finally, an attempt was made to estimate the validity of the Multilevel 
Edition by examinmg the correlation between grade equivalent scores 
taken one year apart on different forms. The median correlation between 
subtest forms was .76, with a range from .62 to .88 for the series. 
Correlations for Reading Comprehension and Reading Vocabulary^ were 
.72 and .69, respectively, between Forms C and D. 

Evaluation and Subtests 

The format of the total test is such that story materials are drawn from 
the fields of social studies, science, and literature. The relatively lengthy 
selections are fo ^ved by items purporting to assess the students' abilities 
to understand the overall theme, to identify the main ideas in paragraphs, 
to infer logical results, to retain significant details, and, finally, to under- 
stand the meaning of words in context (the latter score constituting the 
Vocabulary subtest score). At the upper levels, approximately four ques- 
tions are devoted to literary interpretation of two poems. The authors 
believe that the ability to retain ideas in order to make comparisons, as 
well as the ability to read at a reasonable rate, is assessed incidentally. 

Vocabulary words are underlined in the context of each passage. 
Students are asked to select the appropriate meaning or sh*ide of meaning 
from one to four choices. By presenting vocabulary words in context, the 
authors have avoided one common criticism levied against many reading 
tests. Three word lists were consulted to check the appropriateness of 
subtest vocabulary (Gates, Rinsland, and Thorndike, Lorge). The user 
might be justified in questioning the validity of these lists in view of the 
ages of the instruments as well as the differing data collection techniques 
employed by each (;ompiler. 

The Work-Study Skills subtest (published separately) yields scores on 
References and Charts. The References subtest yields a measure of compe- 
tence in the use of the table of contents, index, and general reference 
n^aterial. The Charts score is based upon achievement in interpreting 
charts, graphs, maps, and tables. Representative selections from elemen- 
tary and junior high textbooks, newspapers, and magazines were included. 
The most heavily assessed skill in the Reference subtest is the ability to 
select an appropriate encyclopedia volume while interpretation of bar and 
circle graphs receive most attention in tlie C/;jm subtest. 

Summary 

The SRA Achievement Series (Reading) appears to have several 



43 



strengths to recommend its use, including ease of administration, clarity of 
format, and continuous, overlapping tests which allow closer approxima- 
tions of the extremes within a classroom. The large standardization sam- 
ple, as well as the provision for local norms facilitating inter- and intra- 
group comparison, contributes to the value of the instrument. 

The time limit is reasonable, although longer than the average class 
period. The three reading scores {Vocabulary, Comprehension, and Total 
Reading) seem to be a logical breakdown. 

Editor's note: Two new t'orms of the SRA Achievement Series i\i and !•) arc 
currently being standardized, too late tor inclusion in this review, l-'ull scoring service 
for the now forms will be available m I-all 197K with complete technical mformation 
available in January 1972. 



ERIC 



• Stanford Achievement Test: 
High School Reading 

Reviewed by J. Jaap Tuinman 
Indiana University 



Name of Te.st 
Stanford Achievement 
Test: High School Reading 

Revision Date 
1965 



Subtests 
None 

Authors 

H. 1*. Gardner 
J. C. Merwin 
R. Callis 
R. Madden 



Publication Date 
1922 

Publisher 
llarcourt 
Hrace 

Jovanovich 



Overview 

This test is one of a ten-test battery of achievement tests for use i|i grades 
9-12. As such it is an upward extension of the Stanford Achievement Test, 
grades 1-9, which has been on the market since 1922, with the latest 
edition published in 1964. 

The Reading Test has three forms: W, X, and S. Only the first two are 
available for norma! use. The latter form is a so-called "secure" form, to be 
used only m special testing programs under conditions which varrant 
minimal exposure of its content to unqualified persons (such as students 
who have to take the test at some later time). 

According to the publisher's promotional flyer this test is "a measure of 
paragraph comprehension, testing ability to understand what is explicit in 
the material read, to judge what is implied, and to draw inferences 
applicable to other situations.'' The manual accompanying the reading test 
gives little additional information about the purpose of the test: "The 
Reading test consists of paragraphs of increasing length from a half-dozen 
lines to paragraphs of nearly 40 lines. Multiple-choice questions are used to 
measure the comprehension of the paragraph." In all there are 65 ques- 
tions. In addition to the paragraphs with multiple-choice questions, there 
are paragraphs in which words have been deleted. Comprehension of those 
paragraphs is measured by having the student select the best word to fill 
the blank from four choices. (The publisher should note that at least one 
story exceeds the 40 lines mentioned in the foregoing quote.) 



44 



The test results in a single raw score; there are no subtests. The 
directions for administration are very clear. The test itself requires 40 
minutes; in addition, about 10 minutes are needed for distributing mate- 
rials, completing the identifying information section, and giving general 
directions. Four different types of answer sheets are available for use with 
the test: IBM 805, IBM 1230, Digitek,and MRC. Appropriate directions 
for administering each type are provided in the answer sheet packages. The 
test can be hand scored using an overlay key. Users must keep in mind that 
students marking more than one option for an answer may have an 
advantage unless the answer sheets are checked for such practice. Com- 
plete scoring and reporting service is available from the publisher for MRC, 
IBM 805, and IBM 1230 answer sheets. Two types of class records are 
available «- one for use with the 12 tests in the complete battery of the 
Stanford High School Test and an abridged version with space for eight 
tests designed to be used with individual tests or in combination thereof. 

The manual is in general a prime example of what an up-to-date test 
manual should be. The variety of information which the manual contains 
and the care with which premises and implications have been stated art- 
exemplary. The development of the test is described in satisfactory detail, 
although the description of the tryout program is rather elaborate. Among 
the other attractive features of the manual are its well-balanced discussion 
of the use of the test results an'*' the judicious inclusion of relevant and 
clear tabular material. Unfortunately, no special manual fo*- the separate 
tests seems to be available. As a consequence, specific suggestions regard- 
ing the use of scores of one particular test, such as Reading, are scarce. 

Norms 

The standardisation program is described adequately. The final norms 
arc based on a sample of 2?, 699 students spread rather evenly over the 
four grades (9-12). Participating schools were selected from nine geo- 
graphic regions, including all 50 states. An attempt was made to insure 
proper socioeconomic representation; however, the manual is a little 
unclear in regard to this issue. Norms are expressed in terms of a standard 
score with a mean of 50 and a standard deviation of 10 percentiles and 
stanines. The meaning of each of these statistics is discussed in clear 
langu:»ge. For Reading, three norm-groups are provided: the total stand- 
ardization group by grade; the subset of students from the total group who 
expressed intent to pursue college work; and a so-called ability group 
where "ability" is defined in terms of stanine score on the Otis Quick 
Scoring Mental Ability Test: Gamma Test. The approach taken by the 
authors allows a very extensive interpretation of a student's score. Yet, 
there are a few questions. The manual points out that the norms are based 
on performance at the beginning of the second semester. If a user tests at a 
much earlier or much later time, it is suggested that adjustments should be 
made. While the suggestion is correct, it seems meaningless in terms of the 
course of action an individual user can take. How is he going to **adjust"? 
Also, using the Otis Test as a measure of "ability" is questionable. Scores 
on this test are for the most part as much a function of "achievement'' as 
are the reading scores themselves. This fact becomes evident fiom the 
systematic increase of mean scores with increase in grade (Table 3, p. 12). 
In this respect it is also relevant tna? if one estimates the reliability of the 



45 



Otis at ,90, the correlation between Reading and IQ, corrected (or unrelia- 
bility in both tests, becomes virtually perfect. 

In general, however, the norms and the suggestions for their use are well 
developed, well presented, and most useful. 



Reliability 

The reliability data reported are as complete as any user would desire. 
All coefficients exceed ,90. In addition, the standard error of measurement 
IS given in terms of T-scores, and its use is discussed. This is a most 
desirable feature. The fact :hat the standard errors are reported only for 
the combined grades is of little importance since neither standard devia- 
tion nor reliabilities vary much across grades. The standard error given, 
therefore, is safe for use in all grades. 



Validity 

According to the authors of the variou? kinds of validity discussed 
content validity is "the most important and directly relevant" (P. 15). This 
reviewer agrees. It is, therefore, disappointing that after a very good 
dehnition of content validity, the only evidence presented consists of a 
table which classifies the items by the nature of the materials from which 
the passages have been drawn. No information is given regarding the 
"behavior," or, as the manual puts it, the "skills, knowledges, and under- 
standings' tapped by the test. As is the case with many other reading tests 
the easy way out has been chosen by practically redefining *V-ontent 
validity in terms of curricular materials only. An analysis of the tests 
theniselves reveals the narrow scope of what the test measures. First of all, 
"reading" is equated with paragraph comprehension, "Paragraph compre- 
hension" is defined in terms of filling in mi-:Jing words and answering 
multiple-cholce questions. The content validity of the first task is obscure 
though Its correlation with question-type tasks is well established. The 
content validity of the question-type tasks depends on the nature of their 
questions. Test writers should present some kind of analysis of their 
questions in an attempt to facilitate the user*s judgment of the validity of 
the mi for their own purposes. It appears to this reviewer that the items 
in this test are the usual mixture of questions involving various cognitive 
levels of operation with a preponderance of items at the level of literal 
underscanding. 

In regard to the issue of content validity and equivalence of forms it 
may be noted that the X and W forms differ considerably in regard to the 
type of tasks included. Form W has nine short passages with 23 words left 
out in all. I.e., 23 fill-in items; Form X has eight such stones with 29 fill-in 
Items Form W has six longer passages with 42 questions whereas Form X 
has five such passages with 36 questions. While the test forms were 
statistically equated, assurance of equal content validity cannot be given 

No criterion-related-validity evidence is presented other than correla^ 
tions with the other nine tests in the battery. In general, these correlations 
are high. Their meaning and the meaning of the resulting factor analysis 
cannot be adequately determined without additional data not normally 
part of the evidence reported in a test manual. 



46 



In regard to the validity issue it may be mentioned that the pubhshcr 
offers to make available to the users a continuously updated bibliography 
of materials related to the use of the test. The copy of this bibliography 
received by this reviewer upon his request, however, was compiled in 
1966. 

Summary 

Within the limits of its validity, the Stanford Achievement Test ~ High 
School Battery: Reading is a good test. Its manual in particular is excel- 
lent, it can be recommended without hesitation for those users who have 
measurement needs covered by this test: that is, those users for whom the 
test has validity. 

No technical excellence can void the fact that as a test of reading this 
test has a narrow scope. This scope can be best described as the largely 
literal understanding of short paragraphs of rather simple structure and 
filled with factual details. For those users who feel that the type questions 
and tasks used in this test are those they would employ to measure 
reading, this test should be a serious candidate. Those users who want a 
more extensive and more complete inventory of a student's reading ability 
should not consider it. 

• Traxler High School Reading Test - Revised 

Reviewed by J. Jaap Tuinman 
Indiana University 

Name of Test 
Trav'cr High School 
Reading Test - RevKed 

Revision Date 
1967 

Overview 

The Traxler High School Reading Test - Revised, an upward extension of 
the Traxler Silent Reading Test, is a revision of a test originally published 
in 1938. The test booklets for both forms (A,B), which have 1966 as a 
copyright date, carry the designation **for grades 10, II, and 12." The 
Manual of Directions, which sets the date of revision ct 1967, indicates 
that the test is intended for grades 9, 10, II, and 12. In addition, norms 
for Grade 9 are provided. This discrepancy between manual and test 
booklets may cause some confusion. 

The test was constructed to measure I) rate of continuous reading of 
material in the social and natural sciences, 2) comprehension of that 
material at the rate read, and 3) understanding of main ideas presented in 
paragraphs taken from high school texts in social studies and science. The 
test does not contain a vocabulary section. Instead, the author recom- 
mends a separate 1 5-minute vocabulary test. 

The 1967 revision does not differ much from ♦he earlier version. The 
basic impetus for revision was the update of some obsolete items. The first 



Subtests Publication Date 

Story Comprehension 1938 
Word Meaning 
Paragraph Comprehension 

Author Publisher 

A. i:. Traxler Bobb^-Mcrrill 



47 



ERIC 



part of both forms was virtually left unchanged. In the second part of 
horm A revisions were made in eight items whereas in Form B tour items 
were updated. 

The directions for administering and scoring the test are clear. The test 
takes 45 minutes in all. The time limits are ve^'ry generous; most students 
will finish before time is called. The test can be either hand scored or 
machine scored. No test-specific answer sheets are provided, a condition 
which may be considered a plus point from an economic point of view. 

In all, five scores are obtainH; rate ( I), story comprehension (2) main 
ideas (3), total comprehension (2 +3), and total score (1+2+."^). The latter 
score IS a little difficult to interpret: Why would one want to add rate (as 
defined in this test) and the comprehension scores? Fortunately, the 
author is rather hesitant in his recommendations about the use of this total 
reading score. This reviewer advises never to use it. Not only is this score 
rather meaningless as a concept, but even in a technical sense h has little to 
recommend it. The mean scores for form B on the l%7 revision were as 
follows; rate, 35; story comprehension, 10; and main ideas, \9, These data 
mean that roughly 55 percent of the total score is accounted for by the 
rate component. These data also throw some light upon another setback of 
the rate score as it is defined in this test (unrelated to the comprehension 
score). An average student who does not read the passage a* aii can easily 
earn a total score of 62+5+19=86, which is far more than he would have 
obtamed by conscientiously reading the story. (This hypothetical student 
IS assigned the mean score for main ideas, 19.) Obviously, he must be 
smart enough to circle a word in the last line when time is called. 
Naturally, a set of scores as earned by this student should alert the user of 
the test. Such a student should be scheduled for special diagnostic atten- 
tion. The high total score, however, may work as a deterrent in the case of 
casual usage of the test. 

Norms 

The norms for the Traxler High School Reading Test were obtained 
from a sample of 7,000 students from schools in the Eastern, Midwestern 
Western, and Southern sections of the United States. The norming data 
were gathered for the 1938 edition. In the twelfth grade 1,164 students 
were included (the minimum) whereas 2,894 tenth graders were tested 
(the maximum). The description of norming population must be judged 
inadequate. Relevant information, e.g., socioeconomic criteria, is missing 
No new norms were obtained for the revised edition. The author tried to 
determine whether such new norms were necessary. In the presentation of 
relevant data, information on the comparability of the revised B Form and 
the unrevised B Form is missing. With characteristic frankness the author 
concludes that the norms for the Story^ Comprehension may have to be 
somewhat adjusted. It is not clear how this could be done, however. 

Reliability 

The reliabilities reported for the subtests presumably are all corrected 
spht-half correlations; the coefficient for the total test is based on the 
correlation between the two forms. All coefficients are based on extremely 
small groups of students (around 75). Comparison of the coefficients is 



48 



made difficult by the fact that different grades were used to obtain the 
various subtest reliabilities. The reliabilities arc estimated as follows: rate, 
.90; story comprehension, .72; main idea, .80; total comprehension, .86; 
and total reading, .91. The author is careful to point out that with the 
exception of rate, the reliabilities of the subtests are too low to allow the 
tests to be used for individual assessment, flis suggestion, though, that the 
'*total score on the test is satisfactory for use in predicting the reading 
achievement of individual pupil" seems not to coincide with the caution 
he urged elsewhere in the manual in regard to the use of this total score. 

Validity 

No data on the validity of the test as such have been presented in the 
manual. The choice of passages may indicate that content validity in 
regard to the materials aspect of curriculum content in social studies and 
science may be assured. The same cannot be said of the behavior aspect of 
content validity, however. The questions in both sections seem to favor 
the lower cognitive skills. 

Whereas data on content validity, ciiteriomrelated validity, and con- 
struct validity arc absent, the manual does contain a section on "item 
validity." The data presented indicate that the average item in the tests 
does effectively discriminate between subjects who scored low and those 
who scored high on the total test. While the use of the term ^'validity" in 
this context is common, it is also slightly misleading. No conclusions about 
the validity of the test itself should be drawn from the information in this 
section. 

Evaluation of Subtests 

Part I of the Traxler High School Reading Test contains a story 
accompanied by 20 questions. When opening the booklet, the student first 
sees the questions, printed upside down, and the opening lines of the 
story. He is asked to read the story first and to circle the word he was 
reading when the examiner says "mark"; this sign is given at 150 and 300 
seconds. The rate of reading is based on the position of the word marked. 
After completion of the story, the student turns back to the 20 questions. 
Though the instructions say "Read as fast as you can read understand- 
if^glX' but no faster, as you cannot work the exercises unless you know 
what you have read," there is some reason to believe that a larger-than- 
chance part of these questions can be answered without reading the story 
at all. Part of the reason may be that a large number of questions are based 
on only one selection of continuous prose. The number of questions 
answered right constitutes the Story Comprehension score, one of the two 
components of the Total Compreheitskm score. 

Part II consists of thirty social studies and science paragraphs, each 
accompanied by four statements. The student's task is to identify the 
statement which contains the main idea of the paragraph. In general, each 
of the statements after the paragraph represents a minor paraphrase of a 
sentence in the paragraph. Once the student has determined which of the 
sentences in the paragraph itself is most important, his task is easy. No 
inferences involving relations among sentences are called for. It is doubtful 
that reading the passage is necessary for determining the main ideas. In 



49 



many instances the right answer can be found without reading the passaucs 
at all. (For what it is worth, this reviewer s sophomore assistant obtained 
scores of 83 percent and 80 percent of the items right when attemptinu 
these items without being able to see the passages.) 

Summary 

Many of the criticisms herein can be leveled against almost anv readinu 
test currently on the market. Even so, the limited purpose of the test and 
its narrow definition of reading comprehension, in addition to the short- 
comings mentioned, make this instrument unsuitable for general use as a 
survey test of reading. The rate section can well be used if^he user finds a 
way of controlling deception on the part of an occasional student. Asking 
easier questions -"hich, however, should be unanswerable without reading 
the passage may jffer a possible solution for this problem. The selection of 
the passages se ;ms to make the test of interest to the teacher in social 
studies and science. The potential user, however, will have to determine 
for himself whether he can live with the type of questions asked as a 
measure of comprehension of materials in t!sese subject areas. 

• The Traxler Silent Reading Test 

Reviewed by Joseph P. McKelpin 

Southern Association of Colleges and Schools 

Subtests Publication Date 

Reading Rate 1934 
Story Comprehension 
Wora Meaning 
Paragraph Comprehension 

Author Publisher 
A. E. Traxicr Bobbs-Mcrrill 



Name of Test 

Traxler Silent Reading Test 



Revision Date 
1942 

Time 
46 or 53 



Overview 

The Traxler Silent Reading Test was designed to measure four aspects of 
reading ability for students in grades 7 through 10: rate, story comprehen- 
sion, vocabulary, and paragraph comprehension. Forms I and 2 were first 
made available in 1934. Forms 3 and 4 were added in M)3^) and MM2 
respectively. The Traxler Reading Test has four subtests: Reading Rate] 
Story Comprehension, Word Meaning, and Paragraph Comprehension, 
When Forms I and 2 were first made available, Part III (Paragraph 
Comprehension) consisted of six paragraphs with three or four comple- 
tion-type questions for each paragraph. According to the publisher's 
manual, a small study was carried out in 1968 in order to change the 
completion questions in the Paragraph Comprehension part of Forms I 
and 2 to multiple-choice items so that machine scoring could be used, if 
desired, with all parts of all forms of the test. The Paragraph Comprehen- 
sion part of Forms 3 and 4 was machine scorable when tho.se forms were 
first made available. 



50 



Lxaniinee pertorniancc time is 46 niiiuitcs tor tlic Iiund-Ncoring adniints- 
trution: tor the niacliinC'Scoring adniiiiistration, tlic c.xunitncc pcrt'orniuncc 
time is 53 minutes. Only on the teM hooklet lor l-orm I, Revised, do the 
directions inchide instructions to the student tor use ot' ;i sep;irute answer 
sliect. Since tlie directions on the other t'ornis tail to niclude instructions 
for using a separate answer sheet Tor macliine-scoring, the overall time 
requirements tor administration may he greater tor t'orms other than Torm 
1, Revised. 

Norms 

The publisher's manual indicates that norms wore derived t'rom about 
25,000 pupils in grades 7, X, 9, and 10. Apparently, results t'rom grades ^) 
and 10 of the Michigan statewide testing program in 1^)37 and l^Wwore 
combined with other available scores t'rom schools elsew'here in the Ignited 
States. The norms for Part U\ , Paragraph Comprchcnsioiu were adjusted ui 
the 196^) revision on the basis of data collected tor that edition. 

Reliability 

Equivalent t'orms reliability estimates are reported in the manual. They 
range t'rom .01,^ for Story Comprvhcimon to .950 for Tnta! Score, The 
single? comprehension estimates are the lowest ones in the set although 
estimates of combined and total comprehension seem highly reliable, l-or 
survey purposes, consequently, the test may be uset'ul, but tor Uutgnostic 
use with individual students it may be questionable. 

Validity 

Lxamination of the procedures and results of validity studies of the »esi 
indicates that it probably measures those aspects of reading ability se- 
lected. Still some questions do arise. The heavy weighting assigned to 
Reading Rate could result in yielding a total score out of proportion with 
the student's comprehension of what was read. The Itiglis Test of linglish 
Vocabulary used as a criterion tor word meaning may not have equal 
et'tlcacy in all population groups. The use of 54 sixth grade students' 
composite scores to establish the validity of the cotnprehenston tests tor 
beginning seventh graders seems to stretch the permissible hmits. 

Evaluation of Subtests 

The Reading Rate subtest retpiires the student to read material about 
animals as last as is consistent with his being able to answer questions 
about the story later. In taking the subtest, the student is twice required to 
indicate the place he has reached in the reading once at the beginning of 
100 .seconds and again at the end of 200 seconds. Numbers in the right 
margin opposite each line of the story are used to translate amount read 
into r^te. 

The Story Comprehension .subtest measures the student's understanding 
of the story read in the Rate subtest. The student is to complete each of 
10 sentences by .selecting one of tlve options. This .subtest is to be taken 
immediately upon completing ihc Rate test. 

The Word Meaning subtest is a measure of vocabulary. The vocabui.iry 
words (underlined) are used in short sentences or phrases which are 



51 



followed by five words or phrases. The student is to select tiie word or 
phrase whose meaning is most nearly like the meanine of the word 
underhned. 

The Paragraph Comprehension subtest is designed to measure the abil- 
ity to read material at varying levels of difficulty. The subtest consists of 
SIX paragraphs and 20 multiple-choice questions. The student is to read 
each paragraph and indicate his option for each of the three or four 
questions lollowmg it. 

Summary 

The Traxler Silent Readnig Test seems to be a useful survey instrument 
to be employed with students in grades 7-10. The social class bias of 
some parts ol the test, especially the Word Meaning, may reduce its 
ellicacy lor students from the lower social classes. 



APPENDIX 



The table of tests on the following pages provides quick reference to the 
basic information on each test reviewed in this book. The index provides a 
quick reference to the critiques of reading tests appealing in Buros' 
Reading Tests and Reviews (Highland Park, New Jersey: Gryphon Pre*::, 
1968) and to Buros' Mental Measurement Yearbook (Highland Park, New 
Jersey: Gryphon Press, 1938, 1940, 1949, 1953, 1959, 1965). These 
excellent test reviews should be studied before a test consumer makes a 
final test selection. 

Within the table, tests are arranged alphabetically by name. The first six 
columns are self-explanatory. The next two columns aie the index. In the 
first column of the index, the number to the left of the colon refers to the 
volume number of ihc Mental Measurement Yearbook tn which the review 
appears: the numbers on the right of the colon refer to the review number 
in that volume. The numbers in the second column of the index refer to 
the page number tn Reading Tests and Review on which the test is 
described and/or reviewed. 

It should be noted that the reviews in Reading Test and Reviews are the 
same ones which have appeared in the MMYs, Both are listed here because 
a test consumer may have access to only one of these references 



ERLC 



53 



«> — " fl) 



O 
0> 
CN 



CO 

g 

> 

UJ 

o 
< 

CO 

co 

UJ UJ 



o E « o 



CO 

O 
O 

03 
< 



a 
z 

o 
< 

UJ 
CC 



2 
UJ 

UJ 

cc 

D 
CO 

< 

UJ 



X 

.£ u 

3 UJ 

a S 

Ja c 



00 



II 



= I 
CO 2 

O K 5 



uj5: 



Q Q. 5 

o d CO 



CO d 
u. o 



CO 
(0 



0) O U 

wT o o c tt' 
a. o O o OC 



5 y K 

c o c 

O - V 

O O OC 



o 2^ o c 

O Q. O D 



< 5 



b. a> 

0 ^ 

< I 

1 > 



o 
|2 



o 

V) 



:k c: 
3 8? 

5a 



I Q 



c 
o 

Si 

i: ^ 
3 Si 

55 



5 I ^ ? 

<3 ^ 5> ^ 

p o «) 

^ (J Q cc 



ERLC 



5^ 



UJ at 



o <S 



St 



to 0) 

a OC 



54 



00 



U) 



s 

in 



5 



3 2: 

5i 



55 



1^2 

UJ I- CO 



Q) U (0 

c n u 

5 g S 

U Q) v» 

w cc < 



o « 
00 S 



O o> 



- b 

D X 0 



• CO 

S I -> o _ 



s 

0) 



r c 5 

Q) Q) O 

2 O ^ 

. CD 

~» O • 



CM 



a (/) 

O c 

O (0 



0) Q) C 

a s 3 


s 1 




at 


(U 


Thor 
Lef€ 
. Nasi 


1 ^ 

OS 


jappe 

SI||G 


X 

re 


X 

re 




u. 6 


dS 




liJ 


J a cc 


ui -) 


oc: oc 


< 


< 


CO 


in 






CM 


to 


(0 




(O 




OJ 


o> 




0> 


0) 



•5 ^ 
5 I 



It ^ 
3 2? 



o 

S 5 



,1 



O § g 

sis 



si 

•0 ^ f" 

^ 2 ^5 
Q: CO s 



r 

ill 



erJc 



I 



cc 
> 



z 



I 



8 
z 





1 








c 


0 


1 = 


c o 

• 'Z 




Q. "C 


is 




mm « 

IQ • 


• ^ 


.S 


C CC 
0 _ 




c 
• 

3 


|s 


11 

< 3 




tu ^ 


^ 5 
00 s 



I 



•Jce 

O & 



en 



X 



<o S 



|2 



55 



inUmational 
book yoor 

1972 




UNESCO 



The International Reading Association attempts, through its publications to 
provide a forum for a wide spectrum of opinion on reading. This policy permits 
divergent viewpoints without assuming the endorsement of the Association 



